


BULLETIN OF THE SCHOOL OF 
EDUCATION, INDIANA UNIVERSITY 





Entered as second-class matter, September 30, 1924, at the post-office at Bloomington, 
Indiana, under the act of August 24, 1912. Published six times a year, from the University Office, 


Bloomington, Indiana. 





Vol. I BLOOMINGTON, IND. No. 6 





Twelfth 
Conference on Educational 
Measurements 


JULY, 1925 

















For sale by the University Bookstore, Bloomington, Ind. 


Price, 50 cents. 


A limited number of copies of this bulletin will be distributed free to 
citizens of Indiana. 











' TWELFTH ANNUAL CONFERENCE 


ON 


EDUCATIONAL 
MEASUREMENTS 


» Held at Indiana University, Bloomington, Ind., Friday 
4 and Saturday, April 17 and 18, 1925 


PUBLISHED BY 
THE SCHOOL OF EDUCATION OF INDIANA UNIVERSITY 


1925 











Contents 


PSYCHOLOGICAL SERVICE IN THE SCHOOL SYSTEM. By 
Rupotr Prytner, Professor of Education, Teachers College, 
Ce COREG S55 .:+ 10:5 sco b nidgtegis sede eeaies Weel saseubaew ee’ 


THE PRESENT STATUS OF INTELLIGENCE TESTING. By 
Pn OUR. oo int by nkbke eh ita t riers tee cakes eke teres 


THE SCORING OF GROUP INTELLIGENCE TESTS. By Rupotr 
| RL de re pine aS Ui he ie Ree pany Pe Ma dom 2 


RECENT RESEARCH IN VOCABULARIES MOST NEEDED IN 
ADULT WRITING. By Ernest Horn, Professor of Education, 
aR a Besa SE SE Wo AR Cd ee Pa 


WHAT SHOULD TESTS MEASURE? By Ernest Horn........... 


A STUDY OF HANDWRITING IN FORTY INDIANA CITIES. By 
W. W. Buiack, Professor of Elementary Education, and Jonn Dae 
Russe.i, Graduate Student in Education, Indiana University.....: 


SUGGESTIONS ON VALUE AND USE OF ACCUMULATED 
RECORDS OF GROUP INTELLIGENCE TESTS. By Herman 
H. Youna, Professor of Psychology, Indiana University.......... 


THE EFFECT OF POPULATION UPON ABILITY TO SUPPORT 
EDUCATION. By Harortp F. Crarx, Associate Professor of 
Peaenten. Tatiana TWmieeetiey 5s. 65 nok os hivc pind bc phi See bade veces 


A MEASURE OF THE LATIN ELEMENT IN THORNDIKE’S 
TEACHER’S WORK BOOK. By Enpwarp Y. Linpsay, Instructor 
in Latin and Greek, Indiana University.................sseeeeeees 


21 


27 


37 


43 








ou 








Psychological Service in the School System 


RUDOLF PINTNER, Professor of Education, Teachers College, Columbia 
University. 


ONE of the striking features of education at the present time in 
this country is the great influence of the psychologist. There is great 
belief in the ability of the psychologist to solve many of the difficult 
problems in education. In no other country is the importance of the 
psychologist so great. In no other country is there so strong a feeling 
that psychology can help in the solution of practical problems as they 
arise in the schools. We find, therefore, an increasing number of people 
being called upon as school psychologists to devote their time to all 
sorts of problems, and I propose to discuss some of the varied prob- 
lems that have come to my attention during the present academic year 
with the hope that we may understand a little better what psychology 
can at present contribute to the practical work of education. 

I am afraid that in some quarters there exists the belief that 
psychology possesses magical power to dissolve instantaneously all the 
difficulties of the teacher and superintendent, and if we so believe, we 
are bound to be bitterly disappointed. I yield to no one in my hopes 
as to what psychology may contribute in the future, but I am forced 
to recognize the decided limits at the present time. There are, however, 
available at present several methods and instruments which, if rightly 
used, can be of decided help in the practical affairs of the school. 

Let us then take up some of the problems as the school psychologist 
is likely to meet them, and I shall illustrate by concrete situations, all 
of which have come to my attention during the present school year. 


I. The School that has never used Intelligence Tests. This will 
be a familiar situation to many school psychologists. Here is a school 
where no tests have ever been given; a principal who has heard about 
tests and feels now that the time has come to catch up with the pro- 
cession, and she makes an appeal to have “some tests” given. What will 
be most helpful to her? In all probability the funds available for the 
work will be limited. 

We must remember that in such a situation our function will be 
as much to educate the principal and her staff as to test the children. 
I think, therefore, that a general survey of the school would be most 
helpful. It should be as simple as possible. If we give too many tests 
and report back with elaborate statistics, we shall probably overwhelm 
the teachers, and altho they may be profoundly impressed with our 
learning and skill, they will probably do nothing about it, and in the 
end come to regard our tests as something too complicated for them and 
not sufficiently practical for everyday use. 


(3) 





4 BULLETIN OF THE SCHOOL OF EDUCATION 


Here is a concrete example. In just such a situation I have de- 
scribed, a group intelligence test and a composite educational test 
were recommended. After these were given, the results were used to 
illustrate and press home two general ideas: (1) the need for homo- 
geneous grouping according to mental ability, and (2) the desirability 
of trying to make each child work up to his intelligence level. The first 
idea is the important one for the principal to grasp, and the second is 
especially significant for each teacher. 

The first point was illustrated by means of graphs showing the 
distributions of mental ages for each class. The great overlapping that 
existed was pointed out. Thus we found in two rooms of the same 
grade children ranging in M.A. from 7-3 to 13-9 in the one case, and 
from 7-0 to 14-9 in the other. The medians and Q’s were almost the 
same. The attempt here was to impress upon the principal the fact 
that in all probability the work of the teacher would be more effective 
if the wide range of talent with which she had to deal could be reduced. 

The second point, namely, the relation between the intelligence rat- 
ing and the educational rating of each child, was brought to the atten- 
tion of each teacher by preparing a special diagram for each class 
which accompanies the class record of the scores and ratings on the test. 
The diagram is essentially a scatter diagram, and each child’s position 
is indicated thereon by a number corresponding to his number on the 
class record sheet. From this diagram the teacher can immediately see 
those children who are, and those who are not, working up to capacity. 
She can also see those children who are above or below the norm either 
mentally or educationally, and, of course, there are many other inter- 
esting facts to be inferred from such a chart. But it is best, I believe, 
at this stage to lay emphasis upon such discrepancies between intelli- 
gence standing and educational achievement as may exist in the class, 
and to direct the attention of the teacher to pay more attention to these 
particular children. I have found such a diagram easily understood by 
all teachers, and one in which they take a great interest. It is simple 
and direct and much easier to understand than a series of I.Q.’s, E.Q.’s, 
and A.Q.’s, certainly to anyone to whom these different quotients are 
unfamiliar concepts. 


II. The School that has adopted Tests as a Routine. In such a 
school we may presuppose that ‘teachers and principal are familiar with 
all the usual technique of testing, and here, therefore, we may go much 
farther in elaborating our results. The specific school that I take as an 
example has given many tests for several years. All the children have, 
at some time or other, had Binet Tests. Here the program that was 
recommended was somewhat as follows: (1) Give a general group in- 
telligence test of all children to check up the Binet ratings. Wherever 
a large discrepance is found between group and Binet, retest on the 
Binet, as in many cases the Binet Test has been given a year or more 
previously. (2) Give Stanford Achivement Test to all children at be- 
ginning and end of semester to measure progress of school. (3) Con- 
centrate on any particular subject in which the school, as a whole, is 
weak for one semester. (4) Study individual children that are not 




















CONFERENCE ON EDUCATIONAL MEASUREMENTS 5 


making normal progress in the light of all these data. Here are the 
median increases in A.Q. from October, 1924, to February, 1925, by 


grade as a result of the stimulus given by this general testing program: 


MEDIAN INCREASES OR DECREASES IN A.Q. FROM OCTOBER TO 











FEBRUARY 
Grade Reading | Arith- Science | History | Language|Dictation| A.Q. Median 
metic 1.Q. 
iE. wracetbannt 3 8 6 0 (-14) -l 1 119 
es aahe ccaeeh 1 4 6 3 (-5) -1 5 118 
RR -3 0 1 3 2 4 0 112 
ER. chiigwcadinn 8 6 : ‘ 7 8 112 

















All of the median A.Q.’s in October were near or above 100, so that 
these increases, tho slight, represent substantial gains. Grade IV made 
no in¢rease in A.Q. This grade was unfortunate in that it had three 
different teachers during this one semester. 


III. Formation of a Class of Bright Children. Help in the selec- 
tion of bright children, so that a special class for such might be formed, 
was requested by one large city school. This is my experience in one 
such actual case. It was decided to start this class among the younger 
children so that it might be followed over a number of years. It was 
also decided to have the children fairly homogeneous in chronological 
age so that they might work together as a group. One semester was 
devoted to preliminary work for the purpose of selecting the cases. In 
April, 1924, a group intelligence test (The Pintner-Cunningham Primary 
Test) was given to all children in the second and third grades. The 
number tested was 180 and the median I.Q. of the group was only 88; 
Qi was 78 and Q; 99. We see, therefore, that only the upper 25 per 
cent of the group rated above an I.Q. of 100. From such results it was 
evident that there were not going to be many very bright children in 
this school. Nevertheless, I feel strongly that there is in every large 
city school a place for a special class of bright children, and bright 
should mean bright in relation to the intelligence of that school and 
not necessarily with reference to any specific 1.Q. The brightest chil- 
dren in any school can profit by being brought together for instruction. 
Even if their I.Q.’s are not extremely high, they are high in relation 
to their fellow-pupils, and such children are being penalized by the 
ordinary classroom instruction, just as much as brighter children are in 
a school where the average I.Q. is very high. 

The group intelligence test was given to get a first rough estimate 
of the available material. Beginning now with the highest I.Q.’s on the 
group test, Stanford-Binet Tests were given to a great number of cases, 
going as low down on the group test cases, until it seemed highly im- 
probable that any further possible candidates would be discovered. 
Eventually 54 children with I.Q.’s above 110 were found. The 35 who 
were nearest together in M.A. were then chosen for the bright class. 
The median 1.Q. of this group was 117. The total range was from 
110 to 138, and the interquartile range from 113 to 121. So it can be 














6 BULLETIN OF THE SCHOOL OF EDUCATION 


seen that the group was not an extremely bright group. It was merely 
bright in relation to the average run of I.Q.’s in this particular school. 
Nevertheless, I believe it is desirable that such classes be encouraged 
wherever possible. The number of children chosen for the group was 
about the average number of children per room in the building in 
question. I felt that we should try to see what could be done with such 
a group under average school conditions, not giving them the advan- 
tage of being a small select group as had been the case in other experi- 
ments with bright children. If the formation of such classes for bright 
children is to become common, I believe that they must justify them- 
selves without claiming advantages which cannot in general be claimed 
by the other children in the school. Finally, the teacher selected by the 
principal for this room had no particular training for this work. She 
was simply assigned by the principal to the job. She professed interest 
in it, but was, I believe at first, somewhat skeptical and dubious of the 
outcome. As the work progressed, she seemed to gain interest and 
enthusiasm. During the first semester, she taught the class by what 
we might call the standard rigid drill-recitation method. The freedom 
one would like to see in such a class was almost wholly absent at the 
start. A committee composed of students of mine kept in touch with 
the class and gave suggestions from time to time. 

In spite, then, of these very ordinary conditions under which the 
class was conducted during the first semester, it is pleasing to note the 
very decided improvement in standard subjects made on the Stanford 
Achievement Test. A repetition of the test in January, 1925, a little 
over three months after the first test in October, 1924, showed a uni- 
form increase in the educational age in each subject in the test. While 
the median increase in C.A. was 4 months, the median increase in read- 
ing age was 7 months, in arithmetic 12 months, in spelling 6 months. 
The median increase in the educational quotient was 2 points. The A.Q. 
was 97 in October and this rose to 106 in January, an increase of 9 
points. The Q or semi-interquartile range for the achievement quotients 
was cut down from 8.5 to 5.5. 

These are not great or wonderful gains, but they are substantial, 
and they have been achieved very easily and simply. That is the 
point I should like to stress. A more homogeneous grouping, a little 
appreciation on the teacher’s part of the réle of intelligence in the learn- 
ing process—that is all. The gains I have quoted above are gains in 
the easily measurable aspects of instruction. I venture to suggest that 
if we had had objective measures of self-control, love of school work, 
self-respect, initiative, and the less tangible elements of character and 
temperament, the class would have registered appreciable gains like- 
wise in these things. The class is still continuing this semester, and the 
work is rapidly becoming less formal; the children are beginning to 
contribute more and more on their own initiative. I think it should be 
the endeavor of:the school psychologist to promote in every way such 
attempts at the formation of classes for bright children, so that they 
may in time come to be taken for granted, just as classes for specially 
dull children are at present. 





se 6 &® © © SY S970 ff ow SO 


a 


as Aenea Om hUelC OCOD 





ee on 


oo wa FF Ve we vee we 


~— =m ‘ee ty 


Vv “a 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 7 


IV. The 1.Q. of High School Children. One of the numerous prob- 
lems confronting the psychologist with reference to intelligence tests in 
the high school is the question of calculating the I.Q. of high school 
children. The I.Q. technique is now so well known and so universal 
that there are advantages in keeping to it, altho we are rapidly coming 
to the opinion that it will in the future be superseded by some method 
of indicating the mental rating of the individual in terms of the variabil- 
ity of the group. In the meantime, however, the school psychologist 
must meet the practical situation. Teachers in general understand the 
meaning of I.Q. and insist on using it. This is all right, and no harm is 
done in the grades. But when we come to the high school level, we are 
troubled with the problem of the best divisor for calculating the I.Q. 
of the adolescent. Should we use 16 or 15 or 14? This, you know, de- 
pends upon the limit of growth of intelligence of the average individual, 
or, better stated, the degree to which our present tests of intelligence 
are capable of measuring the limit of growth. I do not intend here to 
go into this theoretical problem, because it cannot be settled until we 
have further data. The practical issue at present is, Shall we use age 
16 in calculating all I.Q.’s of children age 16 and above as has been the 
custom since the introduction of the Stanford Binet Scale, or, taking 
cognizance of the results of the army testing, shall we use 15 or 14 
as the divisor in calculating the I.Q.’s of children 15 and above, or 14 
and above respectively? What divisor we use will materially affect our 
arrays of I.Q.’s in high school, for it is just here that we find large 
numbers of children of these critical ages, namely, 14, 15, 16, and above. 
For many reasons I recommend the use of age 14. Some of my 
reasons are based on the results of the army testing; others on the 
results of the Army Alpha Test given to school children in many parts 
of the country; others on the results of intelligence tests used in in- 
stitutions for delinquents and the like. Just now, however, I wish to 
call your attention to some data which are not conclusive in themselves 
but are at least suggestive. Let us ask the practical question as to 
what results we will actually get by using 14, 15, or 16 as a base for 
the calculation of our I.Q.’s. : 
Here are the median I.Q.’s on the Terman Group Test for each year 
of high school for several schools based on 14, 15, and 16 as divisors. 









8 BULLETIN OF THE SCHOOL OF EDUCATION 


MEDIAN I.Q.’s IN HIGH SCHOOLS OF THE TERMAN GROUP TEST 









































Base 14 | | 
Year | School A B Cc D | E 
eh, ae Di 113 114 117 121 121 
SAN ha CREO 117 124 124 123 124 
RSS ate fat 120 127 125 125 | 130 
Oe 5 icy cen: | 123 132 130 128 | 129 
Base 15 | A B C D | E 
AG be eye Saks 107 116 117 | 118 
Beak ccesees cancset 112 116 116 116 | 118 
III 111 117 117 117 122 
IV 116 123 121 119 120 
Base 16 | A B | eS D E 
By are’ pie iret 111 | 102 116 117 118 
ae WER Seat a bine 111 | 109 113 114 115 
 haSnalpeiek Same ft 106 111 108 111 116 
eee ..| 109 | 116 114 2 | 12 

| 











Now, if we take the conventional 16 as a base for calculating our 
1.Q.’s of all children 16 and above, we find that most of the schools show 
a tendency to decrease in the median I.Q. from the first to the fourth 
year. With 15 as a base, this tendency to decrease is not nearly so 
marked. There is great irregularity, and there is no uniform tendency 
for the 1.Q. to rise from year to year. It is only when we come to use 
14 as a base that the tendency to rise uniformly from year to year is 
apparent. If the selective power of the high school is in reality suffi- 
ciently strong so as to eliminate the duller students, then it would 
seem as if the use of 14 as a divisor would fit in best with this sup- 
position. 

Again, a study of the actual distributions according to the three 
bases of calculating the I1.Q. is instructive. In School A on the 16-year 
basis, we find I.Q.’s ranging up to 150 in the first year; up to 140 in 
the second; 130 in the third; and only up to 125 in the fourth. In other 
words, the picture we get would seem to indicate a gradual dropping out 
of the children of high I.Q.’s as we advance up thru the school. This 
would surely seem directly at variance with our general assumptions. 
And this is more or less true of all the schools. 

Using age 14 as a base, we find the upper limit School A in the 
first year to be 150; in the second, 140; in the third, 140; and in the 
fourth, 140. There is here practically no decrease and no tendency for 
the high I.Q.’s to drop out or decrease as we progress from the first to 
the fourth year. 








~ = ff Fe © 0 


an mn ae an 





iT 


os. CO 


iff 











CONFERENCE ON EDUCATIONAL MEASUREMENTS 9 


V. Individual Cases. Another type of service that the school psy- 
chologist is called upon to perform is the special study of individual 
cases for the purpose of advice and guidance. Indeed one might say 
that the big function of group intelligence testing is ultimately to call 
the attention of the teacher to the individual and his specific needs. Many 
cases cannot be helped by general classification and grade placement 
resulting from the mass methods of the group intelligence test. After 
all, it is the individual that we are trying to reach, influence, and modify 
by all these methods, and so there come to the attention of the school 
psychologist numerous children who are in some way or other mal- 
adjusted either at home or in school. 

For such children, a psychological or educational clinic for the in- 
tensive study of individual cases is necessary. We have started such 
a clinic at the Institute of Child Welfare at Teachers College, and an 
attempt to group the cases that have come to us during the past few 
months is instructive. I have grouped them under five headings. 


1. Education of the Parent. In this group, the difficulty with the 
child, usually called a behavior difficulty, seems to result mainly from 
bad management or misunderstanding on the part of the father or 
mother or both. In other words, the child’s behavior cannot be modi- 
fied until there has first taken place a modification of the behavior of 
the parent. To clear up such cases, educators and psychologists will 
be forced to penetrate into the home and educate the parents. This 
is particularly true with reference to the very young child. The present 
intensive work on the part of psychologists and educators with reference 
to the pre-school child is due to the fact that we are beginning to 
realize, as never before, how important the early behavior of the child 
is in forming its conduct in later years, and the one big factor in the 
determination of the early behavior of the child is, of course, the be- 
havior of the parents. 

Here is the case of a little boy with an I.Q. of 108 who is brought 
to us because he is uncontrollable. He has to be coaxed to come to the 
clinic. When he comes, he refuses to go upstairs to the examining room, 
but his mother tells him there are movies up there, and a little later 
that there is a birthday party in the examining room. The mother lies 
to him in this manner constantly and bribes him with candy. The home 
visitor reports that the mother threatens, coaxes, whips, and shouts at 
the boy continually, and the child, I think I may say normally, reacts 
by shouting, sulking, fighting, and going into fits of temper. Here, as 
I see it, the child is having a splendid training in what will later be 
called incorrigibility and juvenile delinquency. Obviously this is pri- 
marily a case of the education of the mother, and this we have begun 
so far as we are able. I hope in the course of time that we shall be 
able to make intelligence tests of parents as well as of children. It is 
badly needed in a case such as this. 

Another case of tantrums and incorrigibility is that of a child 
with an 1.Q. of 97. He is obstinate, will not eat properly, and will not 
sleep. The mother reported that she sometimes spends two or three 
hours in reading and story-telling before the child goes to sleep. As 









10 BULLETIN OF THE SCHOOL OF EDUCATION 


the mother seemed to be dominated by the child, it was recommended 
that he be sent to a nursery school. This was done. The teacher re- 
ported that the child ate and slept well the first day, and that she 
does not think the child at all peculiar or difficult. A few days later 
the mother reports that the child goes to bed and to sleep quite willingly. 
He says this is the way he does at school. Here again the main prob- 
lem is that of the mother, who has lost all authority or influence over 
the child at this early age. 

Another, but different, type of bad home influence is to be seen in 
an older child, a girl of nine years, with an I1.Q. of 141. The mother 
thinks the child a prodigy, and she lets the child and everyone else 
know it. She is an only child, and she is told at home and by mis- 
guided adult friends that she is a genius. She spends a lot of time 
typewriting her poems. Her one and only aim in life is to excel in her 
work. The absurdity of her attitude was shown by the fact that she 
declared that she would not let herself get a “bad mark” on her 
metabolism test, and she ate only the most nourishing foods for a week 
before this test. She is cock-sure and egotistical and in many ways a 
thoroly objectionable little girl. Here we see strikingly how a child 
of splendid intelligence has been harmed by her mother’s behavior. 
Again the case is predominantly a case of educating the mother,- for 


without effecting great changes in the mother’s behavior, we can hope 


to do little with the child. 


2. Emotional Instability. Under this’ heading, we have children, 
whose non-intellectual reactions, particularly their affective and emo- 
tional reactions, are misplaced or abnormal. Let me specify by an 
actual case. A girl of 17 is brought to the clinic because she threatens 
to commit suicide. Being too old and too intelligent to be tested ade- 
quately on the Binet, she is given the Army Alpha Test and makes a 
score equal to the norm for postgraduate students. She attends a high- 
grade private school and stands in the upper quartile in her studies. 
She reads incessantly and among other things Karl Marx. She lives 
with her aunt, complains of frequent headaches, and in disposition is 
very melancholy. Physically she is much below par. Now contrast this 
picture of mature intelligence with her personal appearance. She 
dresses like a child, with abundant curls falling over her face and neck. 
When questioned by the examiner as to who curls her hair every day, 
she blushes furiously and is very self-conscious. Further questioning 
elicits the information that she is afraid of growing up. Ever since 
childhood she has had this dread. She says she can remember lying 
awake the night before her tenth birthday and saying to herself, “I am 
not ten yet, not for several hours.” This dread of growing up and 
facing the adult world is constantly with her. Because of this childish 
attitude she is socially out of touch with the other girls in her class, 
and so she is thrown back upon herself and devotes her time to very 
mature reading and study. Delving back into her personal history, we 
find a long record of severe illness, during which she was for long 
periods confined to bed. In all probability it was during this time that 
there grew up within her a liking for the status of childhood. She 








eS WV aS CC. rm wrt = 


—, 


— SS oe 


Ee | 


al 


oo 


— ea ee 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 11 


learned to find great satisfaction in reactions on this level, and she 
has not been able to modify them as she grew older. As I have said, 
she is of high intelligence. She has splendid insight into her condition, 
but she says she “cannot help” these attitudes; she dreads facing adult 
life; there is an acute conflict, and hence the impulse to suicide in order 
to end the conflict. Because of her insight and intelligence, there is 
much hope of solving this conflict and of helping the girl to readjust her 
emotional life to the actual realities about her. Such a re-education 
of her emotional reactions is now taking place. 


8. Special Disabilities. Under this heading come a number of 
cases of children of unequal mental development. Most common of all 
are the so-called reading disabilities, and the causes of such disability 
are evidently numerous. Each case must evidently be considered on its 
own merits and a particular type of treatment followed. Here is a 
specific case: A boy chronologically 7 years 3 months, mentally 7 years 
6 months, intelligence quotient 103. On the Performance Scale, how- 
ever, he obtains a mental age of 11. Here we have a marked dis- 
crepancy between verbal or abstract intelligence, on the one hand, and 
practical or concrete intelligence on the other. Special instruction in 
reading by a graduate student in order to overcome some bad habits 
he has acquired, along with large doses of encouragement, seem now to 
be on the way to straightening out this boy’s difficulties. He has normal 
intelligence, and there seems no reason why he should not very soon be 
able to read properly. 


4. Poor Mentality. Under this heading I group those cases where 
the difficulty or problem that has arisen is due to the fact that the 
child has not enough mentality for the situation in question. These are 
not necessarily cases of feeble-mindedness or indeed backwardness, but 
of relatively low mentality. Here is a case of a boy with an I.Q. of 101. 
He has just been dismissed from a high-grade private school, where the 
average I.Q. is 115. He seems to be developing into a difficult behavior 
case. Here the trouble would seem to be that the child’s intelligence is 
too low to compete adequately with the children of high intelligence in 
the school in question; as a result he is developing a number of ob- 
jectionable conduct reactions. The thing to do with such a case would 
seem to be to try him out among a group of children more nearly his 
equal in intelligence; give him a chance to react adequately on the 
intellectual plane; let him experience success in his school studies, such 
as he has not up to the present time. 

Another case was that of a child being considered by well-educated 
and intelligent people for adoption. The child was a healthy, well- 
nourished, smiling child of 3 years and 10 months. A slight infantile 
lisp made her still more attractive to the maternal instincts, altho a 
very lovable little child. The result of the mental test showed an I.Q. 
of only 75, and everybody was surprised, because all who saw the child 
were inclined to think her at least normal, and probably above. The 
same intelligence test was, therefore, given again two days later by 
another examiner, and in spite of the practice gained by the first test, 
liberal encouragement by the examiner, and the most favorable testing 








12 BULLETIN OF THE SCHOOL OF EDUCATION 


conditions, the 1.Q. could not be raised above 82. Under such circum- 
stances, the parents were strongly advised not to adopt the child, be- 
cause of too poor mentality to obtain such success in school as would be 
expected and hoped for by the proposed foster parents. 


5. Medical Cases. The last group of cases are those that are pri- 
marily medical in nature. These may consist of such defects as poor 
eye-sight, leading to poor school work or bad behavior. Or else there 
are children who are found to require hospital treatment for the clear- 
ing up of certain physical conditions. These cases are not numerous 
in a clinic such as ours, but the fact that they do occur makes impera- 
tive, I believe, a thoro physical examination of every case coming to 


the educational clinic, and this has become our established mode of 
procedure. 


Conclusion. I have tried, then, to indicate some of the very varied 
types of service that psychology can render to the school. This service 
ranges from general studies of a statistical nature of large groups of 
children down to the minute study of the individual. All the work may 
be considered as an attempt to understand better the great individual 
differences that exist among children, so that we may help the individual 
to adjust to conditions that cannot be changed, or else change conditions 
so that they may better influence and educate the individual. The psy- 
chologist must take a wide view and be ready to use all the tools at 
his disposal. One type of work cannot be said to be any better than 
another. Each specific sort of study, a general survey, by means of 
group tests, or an intensive study of an individual, has its own particular 
value, and all types of studies are needed for a better and wider under- 
standing of the individual. 





f 


I 
I 


- 


it 


\f 
.r 





The Present Status of Intelligence Testing 
RUDOLF PINTNER 


INTELLIGENCE testing has during the last few years aroused a great 
deal of interest, not only among educators, but also among the public 
in general. Those of us who have been connected with such work during 
the last fifteen years have seen the intelligence test emerge from ob- 
scurity, pass thru a stage of general skepticism and ridicule, and eventu- 
ally arrive at a respected and assured place in education. Recently the 
implications of intelligence testing have been widely discussed. Such 
discussion has been effective in placing sharply before us certain issues 
in this field. It has shown us more definitely where we have gone astray 
and been inclined to generalize too widely without sufficient data. It 
has above all brought to light very clearly a number of particular prob- 
lems, which many had been inclined to imagine could be readily solved 
by means of our tests. Worst of all, or best of all, according to your 
point of view, it has brought. the implications of the testing movement 
into contact with such all-embracing questions as the relative importance 
of heredity and environment, the function of education in general, and, 
lastly, the real meaning of democracy. 

All such general discussion has undoubtedly served some useful 
purpose, but I am inclined to believe that for the time being we have 
had enough of it. We may go on endlessly repeating our beliefs and 
opinions, and in the end they will merely remain beliefs and opinions. 
Mental and educational measurement needs to withdraw from the heated 
air of the public debating room. Time for further study and research 
is necessary to the end that we may ultimately have more knowledge 
and facts, and when we have found these additional facts, we may then 
revise our previous opinions and hopes and fears in the light of them. 
_ It is rather interesting to consider how, in the short space of fifteen 
to twenty years, the intelligence test has grown from a mere toy into 
a widely used and potent instrument, or to use a more picturesque meta- 
phor, how the Cinderella of Psychology has become a recognized and 
powerful princess, Spearman sees in intelligence testing “the most live 
and futureful shoot of all contemporary psychology”. 

Intelligence testing began, you remember, in the very simple need 
of differentiating more exactly between normal and feeble-minded chil- 
dren. Instead of depending upon subjective guess-work or upon one’s 
past experience with children, one used the tests to find out definitely 
whether a given child could or could not do certain things which children 
in general could do at a given age. This is behavioristic psychology 
before the advent of behaviorism. It was the application of the methods 
of animal psychology to human psychology. Record accurately the 


(13) 











14 BULLETIN OF THE SCHOOL OF EDUCATION 


responses made to a given stimulus, and do not trouble to ask the 
subject how or why he responded in a given manner or what it felt like. 

Having found that a child could not respond to certain tests appro- 
priate to his level, the conclusion was that he was not as capable as 
his fellows, that he possessed less intelligence than they did. The tacit 
assumption underlying this conclusion is, of course, that the child in 
question has had substantially the same opportunities to learn and grow 
as have the other children with whom he is being compared. This being 
the case, what he brings to a test, namely, his intelligence, must be 
less than that of the others. A child’s intelligence then is estimated by 
means of a comparison of his reactions with those of others who grow 
up in the same type of environment. This brings up at once the next 
question, What do you mean by a similar type of environment? Is the 
American environment similar for all children? This is a matter of 
relation. In relation to what? Thinking of all the children in the world, 
we might readily say that American children have a similar environ- 
ment as contrasted with Esquimaux or Chinese children. So within the 
United States we may say that city children live in similar environment, 
as contrasted with country children; that the environment of deaf chil- 
dren is similar as contrasted with hearing children; and so on for any 
groups such as rich and poor, foreign-born and native-born, children 
attending school and those not attending, and so forth down to smaller 
and smaller groups, such as family groups. Make the group small 
enough and you end up logically with the individual himself living in a 
uniyue environment. 

Now all this has a very decided bearing upon the kinds of situa- 
tions or stimuli that can be used as intelligence tests. What will or 
will not serve as a good discriminator among children of different levels 
depends upon the group of children we are testing. The test must be 
something that comes within the environment field of the children con- 
cerned: it must be equally familiar or equally novel to all. If we are 
dealing with school children in a given school, many of our educational 
or achievement tests are fair measures of intelligence. This is not so 
true of comparisons of different schools where differences in methods 
and teachers would disturb the similarity of the environment, in the 
sense that I am using that term. But the environment among all chil- 
dren of these schools is still similar enough to make our ordinary group 
intelligence tests effective measures of individual differences. If we still 
further increase the scope of our testing and wish to include deaf and 
foreign-born children in our group, then a great many of the stimuli 
that made suitable intelligence tests for our school children must be 
omitted. We must leave out all verbal situations—we must change to 





non-verbal tests—because the language environment of the foreign or 
deaf child is not similar to the language environment of the native 
hearing child. And if, ultimately, we are ambitious to construct a uni- 
versal intelligence scale suitable for man in general upon this earth, 
we shall have to choose situations (performance tests or pictures) which 
are common to human beings in all quarters of the earth. Language 
tests of course go; so do numbers and the use of pencil and paper. 





















CONFERENCE ON EDUCATIONAL MEASUREMENTS 15 


Perhaps it cannot be done. But we can approach this and strive to 
make our tests of wider and wider universality. Upon this problem 
of a universal intelligence test, Dr. Brigham and associates are at 
present at work and they have already made good progress. 

The question, therefore, as I see it, is becoming a problem of what 
scale to use for a given group of children. The National Group Intelli- 
gence Test may be a thoroly good test of intelligence for a comparison 
of any native-born children in the United States who have begun at- — 
tendance at school at about the age of six or seven. But it would not 
be the best test or so good a test of intelligence where there is a large 
number of foreign-born children. Perhaps we shall, as this work pro- 
ceeds, be able to limit more definitely the groups within which a test 
can be used effectively. Maybe we shall have a test suitable for urban 
children, another for rural, and so on. At any rate, we shall need to 
know for the practical use of any test the limits within which the test 
remains a fair measure of intelligence, assuming a certain amount of 
opportunity to learn as being common to the group in question. 

These are the two directions, then, that it seems to me we shall 
have to follow in order to solve the problems that are confronting us. 
The one path leads to the construction of scales whose elements become 
ever more universal in their nature, assuming only experience that is 
common to ever larger and larger groups of individuals. The other 
path will lead to a more specific statement for each intelligence test as 
to the groups of individuals for which it is a valid measure of intelli- 
gence, or rather the degree of validity as a measure of intelligence 
must be known for different groups of children. 

The latter path or procedure that I have suggested is, of course, 
merely another way, I think a better way, of raising the question as to 
the effect of schooling upon an intelligence test. This is more specifically 
the way that Burt and his students have stated the problem. Burt’s 
work with his English Revision of the Binet Test resulted, as you know, 
in attributing one-half of the mental age to the influence of schooling, 
using a very narrow and highly restricted criterion of intelligence, 
namely, ability to solve syllogistic puzzles with verbal material. More 
recently one of his students, Gordon, has continued the same type of 
investigation in a highly ingenious way. Using a slight modification of 
the Stanford Revision, he has tested children attending schools for the 
physically defective, canal boat children, and gipsy children. The first 
group attended school 48 per cent of the time, and gipsy children 35 
per cent, and the canal boat children 5 per cent. He finds that as the at- 
tendance of the child tends to become very irregular, the I.Q. tends to 
decrease. Schooling or lack of schooling affects the score on the test. 
There are several things in the test that are more likely to be taught 
at school than at home. More significant still, Gordon finds that the 
1.Q.’s of children tend to decrease as we go from the youngest child in 
the family to the oldest. The greater the lack of schooling, the greater 
the decrease in 1.Q. Up to the age of six or thereabout the I.Q.’s of 
these children are not significantly low. The average I.Q. of the young- 
est canal boat children is about 87; of the next oldest group 77; of the 


3—33363 









16 BULLETIN OF THE SCHOOL OF EDUCATION 


next 72; and of the oldest children in each family 60. The same rela- 
tion holds for the gipsy children. 

Looked at from the point of view outlined by me before, I should 
say that we have here evidence of the fact that the Stanford Binet is 
an intelligence test for school children, and that it begins to function 
less and less adequately the greater the lack of schooling becomes. Note 
carefully that it was standardized on non-school children up to the age 
of five and on school children from six upwards. It obviously contains 
elements that are directly or indirectly taught in school, and to that 
extent it is not a good test of intelligence for canal boat and gipsy 
children. The intelligence of such children might better be measured 
by performance or perhaps non-verbal tests, at any rate, by tests which 
use only elements common to the environment of such children as well 
as of school children. The Stanford Binet still remains an excellent 
intelligence test for American school children. 

Closely allied to this question of the influence of schooling is the 
narrower question of the influence of practice or familiarity with any 
given test material. Some have tried to maintain that intelligence tests 
are impervious to practice effect. This is absurd. It contradicts one of 
the fundamental laws of all learning. Experientia docet, and there is 
no part of experience which is exempt. The practical question then be- 
comes one as to the varying amounts of effect which varying amounts 
of practice or familiarity exert. To give the same child a Binet Test 
every day for ten days will undoubtedly increase the child’s I.Q. To 
encourage the child to find out the answers to the tests will increase it 
more. To teach the child material similar to the tests will lead to fur- 
ther increase; and deliberately to drill the child on the answers to the 
tests will cause the greatest increase of all. One of our students,* Dr. 
Graves, conducted such an experiment and found that immediately after 
intensive coaching on the tests, the mental age was increased by 23 
months. Three months later the M.A. was still 19 months above ex- 
pectation, and at the end of one year, the effect of the coaching was still 
visible by an excess of nine months’ mental age. 

Dr. Woolley also shows that there is a certain increase in I.Q. 
among children attending her nursery school, such as does not occur 
among a control group not attending the school. Here we see the 
effect of using a scale standardized on non-school children for a group 
that has been subjected to a new type of school, in which very probably 
responses used in the test are directly or indirectly taught in the school. 
In the work of Dr. Graves and Dr. Woolley, it will be important for 
us to know how long the effects of coaching or of particular training 
may be expected to last. 

This whole problem of the influence of training on our different 
intelligence tests is important. So far the work has been restricted 
mainly to the Binet Tests, but we must, as I have indicated before, 
broaden the scope of our inquiry and ask how much each kind of train- 


* Graves, K. “The Influence of Specialized Training on Tests of General Intel- 
ligence.”” Teachers College Contributions to Education, No. 143. Teachers College, Co- 
lumbia University, N.Y., 1924. 





> = — 4A + & = OO OO FTO - DD 


an tt 2a teh, oe 


ans 


i a i 


all 


‘X- 


ill 


ur 
he 
up 
aly 
ol. 


ng 


nt 


tel- 
Co- 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 17 


ing or lack of it influences each type of test, verbal, non-verbal, indi- 
vidual, or group. 

The great activity in intelligence testing during the last few years 
has resulted in much discussion as to the meaning or definition of in- 
telligence, and this phase of the question illustrates extremely well how 
the practical applications of a subject may stimulate the theory. At 
the present time, no one theory or definition of intelligence is generally 
agreed upon, and in all probability we must wait for more practical 
work with the tests themselves. 

I believe, however, that a general understanding of intelligence 
which may lead to an acceptable theory is gradually growing up around 
what I may call a description of intelligence. This description of in- 
telligence is largely due to Thorndike and has grown directly out of his 
work with intelligence tests. Intelligence can be thought of as divided 
into various types; it can also be thought of as possessing certain 
attributes. 

Three kinds of intelligence that we are coming to differentiate are 
abstract, concrete, and social. Abstract intelligence is the ability to 
respond to words, numbers, and symbols, and it is the type of intelli- 
gence that is tested by most of our present intelligence‘tests. Con- 
crete intelligence is the ability to respond to actual things—tables, chairs, 
automobiles, and so forth, and we may test this type of intelligence 
by performance tests or by picture tests. Social intelligence is the 
ability to respond to people, to understand and handle people. Tam- 
many Hall undoubtedly possesses much of this social intelligence. Indi- 
viduals possess it in varying degrees, as is shown by their ability to 
understand and get along with other individuals. Executives and ad- 
ministrators should have high amounts of it. At present we have no 
test for this type of intelligence. 

As to the inter-relationship of these types of intelligence, we know 
little. What tests of concrete intelligence we have, and there are very 
few, always show positive correlations with abstract intelligence to the 
extent of 40 or 50 in homogeneous groups, and probably 70 in the total 
range of talent. The chances are that there is a positive relationship 
among these three types, but not a perfect one. 

To turn now to the aspects of intelligence, we find Thorndike sug- 
gesting that we should differentiate between three and possibly four 
aspects or attributes of intellectual activity: (1) the speed at which the 
activity can be performed; (2) the level or difficulty of the activity in 
question; (3) the range or number of similar activities of equal dif- 
ficulty that can be performed; and (4) the method which is used in the 
solution of the response in question. 

At the present time all of these aspects of intelligence are combined 
in our tests, and the weight given to each is unknown. A better de- 
scription of the intelligence of an individual will undoubtedly be ob- 
tained if we can keep them separate. We need, furthermore, to know 
the relation between these attributes. In all probability this is positive 
in every case. We know that speed correlates with level. One of our 
graduate students finds that this correlation is about .40 for homoge- 








18 BULLETIN OF THE SCHOOL OF EDUCATION 


neous groups. This is a correlation between speed of performance at 
tasks of minimum difficulty with ultimate level of difficulty attained on 
those same tasks. In some of our group intelligence tests we have un- 
doubtedly weighted speed too highly, and the general tendency now 
seems to be to stress level or difficulty as being the more important as- 
pect of intelligence. Level of intelligence is undoubtedly the aspect least 
influenced by training, whereas range is very largely a matter of op- 
portunity. The number of things of a given level that I can do is, to a 
large extent, a question of the opportunity I may have had of becom- 
ing familiar with them. Yet, at the same time, I cannot help but be- 
lieve that an individual of a higher level of intelligence will have a 
wider range on lower levels as compared with an individual on one of 
these lower levels, schooling and opportunity being equal, simply be- 
cause the individual of the higher level will bring to the solution of un- 
familiar intellectual tasks a greater keenness of mind. On the other 
hand, no increase of range thru training can be conceived of as increas- 
ing the level of intelligence. 

Method is the last attribute of intelligence and has only recently 
been suggested by Thorndike. I presume that the idea here is that a 
given task may be performed correctly in many ways, and that these 
ways or methods can be thought of as ranging from very poor to very 
good. Wherever it is possible to discover the method of work of a 
given individual, it would be feasible to assign a score for the method 
concerned. Method, I suspect, will be closely correlated with speed, 
roundabout or uneconomical methods taking longer in general than 
straightforward and economical methods. It is interesting to note that 
the method of attack in an intelligence test was taken account of in 
some cases by Binet in his original scale. Even in the Stanford Scale 
blank in several tests a space is provided for the examiner to note the 
method of solution used by the subject, altho no practical use is made 
of this in the final scoring of the test. 

The theoretical discussion of intelligence has also raised the ques- 
tion as to the relationship between intelligence and other aspects of 
the total personality, such as emotional and affective, temperamental 
and volitional elements. We are becoming more definitely conscious of 
the fact that our intelligence tests are measuring only one part of a very 
complex whole. In giving our tests we attempt to obtain maximum 
coéperation, interest, docility, initiative on the part of the subjects, and 
in general we seem to get it. To the extent that such maximum 
coéperation is not given by any subject, his score on the test becomes a 
less valid measure of his intelligence alone, since there is included in 
it, so to speak, an indirect measure of his non-codperation and in- 
cluded in a way which cannot be disentangled from his intelligence 
seore. So it is with honesty, conscientiousness, perseverance, and many 
other character traits. All such character qualities and many others 
help to determine success in school and in later life, but we need meas- 
ures of such qualities in order to approach a better measure of the 
total personality. There has been great activity in this field recently* 


* See ‘the recent article by Symonds, P. M. “The Present Status of Character Meas- 
urement.” Journal of Educational Psychology, Vol. 15, No. 8, November, 1924, pp. 484- 
498. 











ile 
he 
de 


2S- 
of 
tal 
of 
ry 
im 
nd 
im 


aS- 


-aAS- 
84- 











CONFERENCE ON EDUCATIONAL MEASUREMENTS 19 


and there will be much to record during the next ten years. More exact 
measurement here will help to make our measures on intelligence more 
accurate, in the sense that we will know the effect of different emo- 
tional and character traits upon our tests. 

Another method of considering the present status of the intelligence 
testing movement is to analyze the publications that have recently ap- 
peared. I have made a brief analysis of about 160 books and articles 
bearing upon intelligence testing which have appeared during the last 
two years, 1923 and 1924. This does not pretend by any means to in- 
clude all articles on this subject, but rather the most important ones 
and those appearing in the more strictly technical journals. In my 
tabulation, 20 articles appear under more than one caption. 

In 30 of these articles I find some phase of the theoretical aspect 
of intelligence testing discussed. Ten of them deal with individual 
scales of measurement and 19 with group intelligence tests. The rest 
ure primarily devoted to the application of tests to different types of 
individuals. The 30 dealing with theoretical aspects of the subject are 
concerned with the definition or description of intelligence, the problems 
of validity and reliability of tests, the problem of the constancy of the 
1.Q., the influence of environment, and the probable shape of the curve 
of growth of intelligence. These articles indicate very clearly the 
strong and persistent interest in the theoretical aspects of our problem. 
The 10 articles dealing with individual tests show relatively little work 
along the lines of new individual tests, only one or two new tests having 
appeared. They are mostly concerned with revisions or adaptations of 
old tests. The 19 articles devoted to group tests show a few new group 
tests, with modifications or revisions of old ones. 

The articles dealing with applications of intelligence tests to var- 
ious groups of individuals may be summarized as follows: the feeble- 
minded, 10; the superior, 18; the school child in general, 18; the col- 
lege student, 25; the delinquent, 8; the dependent child, 1; the deaf, 4; 
the blind, 2; the negro, 4; the foreign child, 8; the employee, 14; and 
tests of individuals with relation to the problem of the inheritance of 
intelligence, 9. The predominant interest at the present time would 
seem to be in testing college students, school children, superior chil- 
dren, and people employed in various occupations. This shows a de- 
cided shift from the earlier emphasis in intelligence testing. This 
earlier emphasis was undoubtedly upon the feeble-minded and the de- 
linquent. It would seem as if, for the time being, our interest in the in- 
telligence characteristics of these groups has been exhausted. On the 
other hand, attention has been sharply focussed upon the child of high 
ability, and we have learned a lot about him. He has become decidedly 
a school problem, and educators are not yet agreed as to how he should 
be handled. The large number of articles dealing with school children 
and college students is indicative of the fact that intelligence testing is 
rapidly becoming one of the accepted tools in school and college admin- 
istration. 

The psychologist has frequently been criticized of late for the great 
number of tests which have appeared. At first, this criticism would 
appear just, for when I turn to a recent bibliography of tests and 





20 BULLETIN OF THE SCHOOL OF EDUCATION 





measurements*, I am amazed to see 99 different intelligence tests listed. 
But an inspection of the list shows that 54 are rarely used or local or 
obsolete, that 6 are English tests and not suited to American schools, 
that 3 tests have been mentioned twice under different names, and 2 
are not intelligence tests at all. This leaves 34 usable tests, 10 of which 
are individual tests such as the Binet or Performance Tests. There 
remains, therefore, only 24 group intelligence tests to cover the whole 
range of intelligence from kindergarten to college, and to cover the 
various types of intelligence such as verbal and non-verbal. In view of 
what has been said about the general tendency at the present time to 
analyze intelligence into different types, it is obvious that we need more 
and better intelligence tests than exist at present. The number we have 
now may seem large to the amateur who thinks of intelligence as an en- 
tity that can be equally well measured by any test, but the number is 
woefully small, and the content covered by the tests is very restricted 
when we think of intelligence in the broader sense I have tried to in- 
dicate in this paper. 

In conclusion, therefore, we see that at the present time, in addi- 
tion to a wide-spread practical use of intelligence tests in schools and 
colleges and business organizations, there is also much theoretical dis- 
cussion as to the nature of intelligence. From such discussion will 
come better tests and better methods of rating intelligence. Our in- 
struments of measurement and our technique will improve. The sphere 
of measurement is gradually widening to include the measurement of 
different kinds of intelligence, and to include the non-intellectual as- 
pects of the individual. Steadily and persistently the psychologist is 
approaching his ideal—the accurate measurement of the total person- 
ality. 

Bae * Buckingham, B. R. et al. “Bibliography of Educational and Psychological Tests 
and Measurements.” Bulletin 1923, No. 55. Bureau of Education, Washington, 1924. 














nas = ff 


He ~_- we CD 


a ee ee. 


> me 4 fe CUS OF 








The Scoring of Group Intelligence Tests 
RUDOLF PINTNER 


INTELLIGENCE tests have developed out of the personal interview of 
the examiner with the subject, where the rating or judgment made by 
the examiner was wholly subjective. The individual intelligence exami- 
nation is an attempt to make the interview less subjective, and it does 
this by attempting to make the situations or stimuli presented to the 
subject more or less similar for all subjects and in addition to score the 
responses made to these standard situations according to definite rules. 
The great variety of responses, however, which can be made to our 
individual intelligence tests makes the scoring a difficult matter, and 
the factor of the subjective judgment of the examiner is still con- 
siderable. 

The introduction of group testing has brought with it much more 
objectivity in the matter of scoring, altho at the same time by its very 
nature the group test has lost much of the control over each subject, 
which is alone possible in the individual examination. The individual 
examiner can control such factors as attention, interest, codperation, 
and the like to a much greater extent than can the group examiner. 
The increase in objectivity of scoring in group testing is, however, a 
great gain. The need for as much objectivity of scoring as possible 
was clearly obvious in the army testing, where men without any train- 
ing in psychology had to be used as scorers. It is from this period 
that we obtain our so-called “fool-proof” methods of scoring data. 

To what extent are our methods of scoring “fool-proof”? To what 
extent is there room for difference of opinion or divergence in the scor- 
ing of our group tests? How much emphasis must be given to this 
matter of scoring in the training of students in intelligence testing? 
Such questions have been brovght to the writer’s attention in the process 
of training students as intelligence testers. That there are difficulties 
and ambiguities is obvious from the questions raised by the students in 
the course of their work. To obtain some measure of these difficulties 
it was proposed to experiment with the National Intelligence Test, which 
is a good example of our present group intelligence tests. 

An experimental blank of the N.I.T., Scale B, Form 1, was pre- 
pared by going thru a great number of actual test blanks filled out by 
children which were incorporated in this blank. Samples of such re- 
sponses follow: In Test 1, omission of decimal point in several items; 
omission of qualifying descriptions of answers, such as hrs., lbs., $, and 
the like; use of decimals for fractions; failure to reduce fractions to low- 
est terms, etc.; in Test 2, the writing in of correct responses, the under- 
lining of two or more words or of a word and a half or of other than the 
response words; in Test 3, writing in of responses, underlining both 


(21) 

















22 BULLETIN OF THE SCHOOL OF EDUCATION 


responses, deleting the wrong response; in Test 4, writing in response; 
crossing out or encircling a word instead of underlining; deleting the 
three wrong words and leaving right one alone; underlining one and a 
half, two, or more words; in Test 5, response by marking all items D 
Was used. 

The experimental blank, constructed in this fashion, is therefore 
not similar to any one child’s paper, but is to be considered as containing 
a great number of the scoring difficulties which will confront the ex- 
aminer in the course of his work. By giving this blank as an exercise 
in scoring, we may get a measure of the amount of difference of opin- 
ion that may exist with reference to ambiguous responses and also a gen- 
eral measure of the accuracy of the scorers in handling a paper, in 
weighting, adding, and the like. A sufficient number of copies of this 
experimental blank were made in order to provide one for each student, 
and the scoring of the blank was made a class exercise. Later on a 
second blank was prepared of the National Intelligence Test, Scale B, 
Form 2, using the same types of ambiguities. The two blanks may be 
considered as roughly of equal difficulty. Two groups of students took 
part in the scoring. The 1923 group scored Form 1 only. The 1924 
group scored Form 1 in October, 1924, and Form 2 in January, 1925. 
The 1924 group were given the first test before they had had any dis- 
cussion of group testing or any practice in testing, except such as they 
received previous to enrollment in the course, and in the case of a few 
students such previous practice was considerable. The class is com- 
posed of graduate students in education. Most of them have had ex- 
perience as teachers or supervisors and several have held positions as 
school psychologists. 

The experimental blanks were also scored by the writer and his 
assistants. Ambiguities were discussed by them, and the most reason- 
able interpretation of the directions for scoring was taken. In this way 
a “correct” score was arrived at, but the writer is of course aware 
that there is still room for a difference of opinion of several points. 


The Distribution of Total Scores. The total scores obtained by the 
students differ from each other because of the differing interpretations 
given to scoring directions, because of errors in scoring, and also be- 
cause of errors in weighting and adding. 


1923. Group 1924 Group 

Form 1 Form 1, October, 1924 Form 2, January, 1925 
Range........45 to 71 wee: AR eee 58 to 90 
Q; 62. (id Ee ete nad bree eee 74 
Med. . ee 2 ae Lwiashe i a ee eee 70 
Oo ee vrcees et IRE FEO Ue | ER he ee 67 
Mise eetniae = ME ‘woud in seks Wie» etre ota see clnain cos 3.5 
ES Oe tae Arta el, We ee en ee 42 


The “correct” score for Form 1 was 66, and for Form 2 it was 74. 
The general tendency seems to be to score too severely. There is an 
alarming range of scores given to the same paper, but we must remem- 
ber that this is not an ordinary paper such as the average child would 








1- 


ld 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 23 


turn in. The total range of scores, from 33 to 90, is equivalent to a 
range in mental age from about 7 to about 11. The repetition of the 
test in January, 1925, with the 1924 group shows a much closer agree- 
ment with the “correct” score. The difference between the “correct” and 
median student’s score in October is minus 8, and this is reduced to 
minus 4 in January. There is also a tendency for the class, as a whole, 
to agree with each other in ambiguous matters as evidenced by the 
smaller and semi-interquartile range. Evidently the experience of the 
class from October to January has had some effect. 


The Number of Errors. Errors are to be considered as differences 
in interpretation of scoring items or actual errors due to oversight, mis- 
takes in subtotals of tests, and to mistakes in weighting and adding sub- 
totals to obtain total score. 


1923 Group 1924 Group 
Form 1 Form 1, October, 1924 Form 2, January, 1925 
eee i eer oe ae a te eet ee. 1 to 14 
RN Ee Ne SP pac PRES ean De elec awienh tie wins 9 
| Eee Re eats ey | Se RRS AP Apt re 6 
Qi iia. ude Ages. ee Peet ae eee 4 
* NPE de PEA er ae Be” eas ee. catateaie es 2.5 


The 1923 group shows the greatest number of errors. The 1924 
group shows a decided gain from October to January, the median num- 
ber of errors in January being five less than in October. The total 
range has been considerably reduced, and the reduction of Q shows that 
the class agrees among itself better in January than in October. No 
supervised practice in scoring N.I.T. blanks had been given in the in- 
terval. The improvement must have been due to the discussion of the 
October experiment, which was used as a means of bringing up general 
principles of scoring. In addition to this, National Intelligence Tests 
had been given and scored by the students in connection with the several 
projects undertaken by the class. Discussion of scoring had undoubtedly 
taken place among various groups of students working on the same 
project. Improvement in certain gross errors in the 1924 group may be 
noted from the following: 


October, 1924 January, 1925 


Ge Oh See es <i 0.0'0-5.5 00d cine sss a5 oe 4 0 
Errors in getting sub-totals of tests............. 7 8 
Neglect to multiply by weights.................. 1 1 
Eve Bk Fees, OI 6 6 on 05k 6 ins 000 ced 2 1 
Crediting stereotyped response.................. 14 0 


There is distinct improvement in all types of error with the excep- 
tion of computing sub-totals of tests. The last type of error, which is 
called above “crediting stereotyped response”, is due to forgetting the 
rule that when a child uniformly marks one of two alternatives right 
down the page he is to be credited with zero for the whole test, rather 
than be given credit for such items as he may have marked correctly 
by chance. 


4—33363 e 











24 BULLETIN OF THE SCHOOL OF EDUCATION 


Ambiguous Responses. The type of response in which there was 
greatest disagreement between the students as a whole and the writer 
was the case where the child in Test 3 or Test 4 deletes the wrong item 
or items and allows the correct item to stand unmarked, instead of 
underlining the correct item. The writer credits such methods of in- 
dicating the answer by the child as falling under the general rule, “Any 
clear method of indicating answer is given full credit.” A difference of 
cpinion may well exist as to whether this method is “clear” or not. 
Many students argued emphatically that it was not “clear”, and it is 
certainly true that the distinction between deleting and underlining is 
not great. The next largest difference of opinion between the “correct” - 
score and the students’ marking was in an item where the child in under- 
lining a word had run on and included half or more of the next word. 
This the writer interpreted as a slip of the pencil, whereas the students 
in general were inclined to interpret it as an underlining of two words. 
Some of the large differences between the writer and his students were 
due to failure on the part of the students to note carefully some of the 
minor rules of the scoring key and to trust to their own judgment in- 
stead. 


Improvement in Scoring. With the 1924 group we may obtain some 
estimate of improvement during a three-months’ interval. During this 
interval all of them were engaged in working intensively with group 
intelligence tests of all kinds. They did not each receive the same 
amount of training with the National Intelligence Tests. Some re- 
ceived a great amount of training with these tests, others very little. 
All of them, however, received the benefit of the class discussion upon 
the results of the first experimental test in scoring, as well as the gen- 
eral discussions on scoring of all sorts of group intelligence tests. There 
were 38 students who took the two tests in October, 1924, and in Janu- 
ary, 1925. The average number of errors in October was 12.6, and this 
decreased in January to 6.9. The improvement for each student was 
also expressed as a percentage of the initial score. All but six showed 
positive gains, two showing zero gains, and four negative improvement. 
The median percentage gain was 44 per cent, Q, being 17 per cent and 
Q: 62 per cent. On the whole, then, considerable improvement results 
from drawing specific attention to the scoring difficulties involved in the 
National Intelligence Tests, plus whatever practice in scoring tests may 
have been had in the interval. The amount of practice in scoring N.I.T. 
blanks was estimated each time. In October the median estimate was 
one blank, which means that 50 per cent of the class had had no 
previous experience with this group test. The other half of the class 
had previously scored from 1 to 700 blanks. In January the median 
jumps up to 137 blanks scored. At this time the lowest estimate is 50 
and the highest 1,000. 


Influence of Previous Practice in Scoring. From the estimate of 
students as to the approximate number of N.I.T. blanks scored up to 
date, we may obtain some idea as to the influence of such practice on 


the accuracy of the scoring of the experimental paper. The results 
are as follows: 











CONFERENCE ON EDUCATIONAL MEASUREMENTS 25 








Errors in Scoring Test 





Estimate of Blanks Scored Average 
1923 October, 1924| January, 1925 





Netto oo aes tos oles 17.7 11.5 


1 i ede eho tee 15.0 12.6 5.8 
51-00 es ee eS Ra aap le as a0 TO ie 6.0 
Owes sip boee cs Sesis cere 9.0 16.3 7.8 

















From the above, it would appear that the two classes behave dif- 
ferently. In the 1923 class those who have scored the greater number 
of papers tend in general to do better on the experimental blank. The 
reverse is true of the 1924 class. Evidently there is no general rule 
here as to the influence of previous scoring upon accuracy in scoring. 

From the 1923 group an estimate of the number of standardized 
tests of all sorts, both educational and mental, was obtained. This 
may be considered a measure of their previous familiarity with psycho- 
logical tests in general. The relation between such estimates and the 
errors made in the N.I.T. experimental blank may be shown as follows: 








Estimate of all Blanks Scored Number of 
Students | Average Errors 














NG chi sdise Jae cin tbe tencin bees sa ki eke 10 11.5 
Se cbt eka hae toe ales Seon aes 12 11.1 
GE oo in oh sche 4d cp enlas baad baeets ee 12 15.0 
FF SR ee eer en Pee en RT 12 12.3 


Practice in scoring papers in general would seem to have no notice- 
able influence upon accuracy in scoring N.I.T. blanks. It would seem as 
if the scoring of standard tests in general might tend to form scoring 
habits and that these persist and are not modified with practice, unless 
particular attention is drawn to them. This would point to the neces- 
sity for “codperative scoring”, the checking of other students’ work, the 
discussion of differences, because we have seen that where this occurs 
the number of errors is greatly decreased. 


Estimates and Amount of Improvement. The 1924 group took both 
experimental papers and estimated both in October and in January the 
number of N.I.T. blanks that they had scored. As we have seen above, 
the median of this estimate changed from 1 to 137. If we subtract the 
October from the January estimates for each student, we obtain the 
estimated amount of N.I.T. blanks scored during the interval. A study 
of the estimates together with the percentage of improvement in the 
N.LT. scoring experiments shows no positive relation whatever. The 
median improvement of 22 students who scored less than 100 tests in 
the interval was 57.9 per cent; the median improvement of 16 students 










































26 BULLETIN OF THE SCHOOL OF EDUCATION 








who estimated that they scored from 100 to 500 tests was only 30.3 per 
cent. Of the four highest estimates, namely from 300 to 500 blanks 
scored in the three-months’ interval, one decreased in accuracy on th2 
N.I.T scoring test, one showed no change, and the other two showed im- 
provements of 20 and 36.3 per cent, where the median improvement for 
the whole class was 46 per cent. 

It would seem, then, as if a tendency to give a high estimate of 
the number of blanks scored, which may be a tendency to overestimate, 
is accompanied with inaccuracy or carelessness or the formation of 
scoring habits which cannot easily be modified by discussion and training. 

The estimates of the total number of standard tests, educational 
and mental, scored by these 38 students up to January, 1925, range from 
400 up to 10,000. The median improvement in the N.I.T. scoring ex- 
periment of the 21 students whose estimates are below 1,000 blanks 
scored is 54.5 per cent, while that of the 17 students estimating above 
1,000 blanks is 36.3 per cent. The rank correlation between improve- 
ment in the N.I.T. experiment and estimates of tests previously scored 
is minus 26, showing again what we suspected previously. There is a 
slight tendency for those students, who estimate that they have scored 
large numbers of tests, to show less improvement in accuracy of scor- 
ing N.I.T. blanks than is the case with students whose estimates of 
tests scored are more moderate. 

We must not forget, however, that in all cases there was decided 
improvement during the three-months’ training in group testing. The 
fact that the more experienced scorers made less improvement than the 
less experienced points to the probability of “scoring habits” which are 
not so easily modified. The general fact of improvement in all cases 
points to the desirability of training in group intelligence testing. The 
large differences in scores found suggest the need for codperative 
scoring and for checking of papers. It also indicates that our scoring 
keys and directions to scorers contain a great many ambiguities. If 
this is the case with the National Intelligence Test, where the ob- 
jectivity of the test is great, it will be truer of other tests that are less 
objective. Finally, we must bear in mind that the conditions of this 
experiment are artificial in the sense that no actual child’s paper would 
ever contain all the doubtful responses gathered together in the experi- 
mental paper in question. There is much less deviation in the scoring 
of actual papers, because many ambiguous responses are not likely to 
occur in one paper. The writer has certain data on this point which 
will be published later. The facts in the present article show that it is 
necessary to train workers in group testing if we are to expect agree- 
ment in scoring methods, and, furthermore, that mere experience in 
scoring group tests is no guarantee of accuracy. 











Recent Research in Vocabularies Most Needed 
in Adult Writing 


ERNEST Horn, Professor of Education, University of Iowa 


IN the next few minutes I shall try to describe very briefly some 
recent investigations which were undertaken for the purpose of dis- 
covering which words are most important in the adult writing vocabu- 
lary. 

There are at least five possible measures of the importance of a 
word, in adult writing vocabularies. First, the word must be used with 
considerable frequency. A recent book purporting to contain 5,000 words 
commonly missf@lled contains words like acciaccatura, eschscholtzia, 
and gallimaufry. Do you commonly misspell these words? It is of 
course ridiculous to include such words among those commonly mis- 
spelled. The average college graduate does not even knoW what they 
mean. Our recent studies, conducted under a grant from the Common- 
wealth Fund, have resulted in the compilation of the results of the dif- 
ferent investigations which have been made in the past, and have 
added new investigations sufficient to make up over 5,000,000 running 
words. These investigations furnish the data which can be used as a 
measure of frequency. 

The second criterion of importance is that of universality. To sat- 
isfy this measure, a word should be used by a large proportion of 
various classes of adults in writing. It ought also to be used in dif- 
ferent types of communities in various parts of the United States. The 
investigations earlier referred to have included correspondence from all 
types of communities and from all sections of the United States. 

A third measure of importance is that of permanence of the word. 
We can hardly justify teaching the school child of today words which 
he is not likely to need when he has finished school. The data which 
we have, while not completely satisfactory, do give some measure of 
permanence since they include the analysis of a considerable amount of 
written material one or more generations old. 

The fourth measure of importance’is that of quality. When the 
first investigations of written correspondence were completed, there were 
individuals who objected to basing a course of study in spelling upon 
these investigations because of the possibility that, while these words 
may be those which people do use, they are not the words people ought 
to use. In order to get some measure of quality the present investiga- 
tion included the analysis of about 690,000 running words of the cor- 
respondence of gifted writers, as well as about 215,000 running words 
of letters contributed to periodicals and newspapers. These data fur- 
nish an admirable measure of the quality of words. That is, they give 


(27) 



























28 BULLETIN OF THE SCHOOL OF EDUCATION 


us for the first time data concerning the type of vocabularies which 
characterizes the writing of those who use the English language best. 

A fifth measure of importance may be called cruciality. By cru- 
ciality is meant the seriousness of misspelling a given word. It is 
clearly not so serious to misspell a word in a letter to a friend as it is to 
misspell a word in a letter in which one is applying for a position. 
Both thru the analysis of business letters and thru the analysis of a 
large number of letters of application, a fair measure of cruciality has 
been obtained. 

Some notion of the nature and extent of recent investigations can 
be gotten from an examination of the following tables and charts. The 
accompanying chart shows how complicated is the problem of determin- 
ing the common words which should be included in a course of study. 
It attempts to show the large number of sources from which words get 
into common usage. You will notice that the arrows point in both 
directions. This is to indicate that words are not only coming into the 
English language, but also going out. Most of the words shown on this 
chart are now recognized by the average educated individual. However, 
one word—niman—meaning to take, is now unknowh except to the 
student of the history of the English language. It was formerly as 
commonly understood as take now is. - 











29 


CONFERENCE ON EDUCATIONAL MEASUREMENTS 



















t 
Jays Laaddiys 
0049 
193 1vIC fyurp SNVIS 
aiboq ap 213009 
peayoucg 
= pre - cleus 
une peay yo} ) 
@WL023N0 ma 
(qesy) 3 q+eys/ loys "1 “VOINH93) 





(asen$njysog) s2avyed set f Yury 
(maaqaH) weyzeraa}/ Ruogaf sapi2 
(aseury>) moyrmoyp | sasuod. 22} «~SCHOM NOWHWO4D 








4+2q 


(4siurdg) aepurss| orsoul sebio owes arydasique 
(svag's) ayayaya\ Suods J sprouapy 
(ueissay) yunsysog yorvog\ vhpoa Ws ourmez tA 








hyidts20 
NOIZ¥O4 P +! 
WMsijeas 











wesiyeuserd TNIIDS 
f 
soishydezow in alan 
£5o}02uU0 
ue tu 
aso OTS | 
emt 



















30 





BULLETIN OF THE SCHOOL OF EDUCATION 


Taste I. SumMMARY SHEET oF VOCABULARY RESEARCH CARRIED ON UNDER A 


GRANT FROM THE COMMONWEALTH FUND 














Total Number of | Number of 
Type of Study Frequency | Running Different 
of Words Words Words 
Business correspondence. . . Pela ts 659 , 224 1,648,060 | 15,152 
Letters of literary men ialkepetae 275,779 689,448 | 23,581 
Personal correspondence .............. 561,056 1,402,640 | 19, 243 
Letters of application and recommenda- 
EGE aR RE BE Nahe 61,456 153, 640 | 5 012 
Material contributed to newspapers and | 
magazines........... areced 86, 162 215, 405 | 13,328 
ON “eer ee eee ee 49,479 123, 698 | 5,728 
ee Sa 12,109 30, 273 892 
Letters ll one superintenden t of sc sheola 
for eight years..... sain 91,984 229 , 960 6,512 
Composite, made up of: 678,153 6,889 
Burke ' . 13,356 
Andersen : 7 .. 350,389 | 
Cook and O’ Shea . 193 , 737 | 
ON ee as ; 100,000 
Ayres.. 20,671 | 
ee | 
678, 153 | 





Total number of running words analyzed............ ! .. 5,180,277 
Total number of different words found........ ete ep : 38, 162 
Old composite list..... . j 7 ...... 800,916 running words 


Words omitted in commonwe: ath list. en ee eee 485,730 


60.64 per cent. 


Lists in composite list combined with business correspondence: 


Bankers’ letters oy .... 67,581 running words 
SN UOMO «si vss .acs bode ad ve kente enna’ 2,441 
The Emporium ; te 10, 838 
NN 5, Se sigan Ra ein oe che keaa ee oe ee 65, 500 


Clarke 28,292 runnings words combined with newspapers and magazines 


Table I shows the types of vocabulary which have been analyzed up 
to the present’ time. Some idea of the detail of the investigation can 
be gotten from the Tables II, III, and IV. 





we) 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 31 





































































































Taste IT 
| | Total 

Total | Fre- 1) 2/}3/) 4/5) 6) 7} 8} 9 | 10/11) 12/13) 14) 15) 16) 17/18) 19) 20/21/22) 23) 24)25)26 

words| quency) | 

| ey | | 
66 | army....... 22 | 2| 11}..] 1 1 | 1} 2) 41. 
Bis. ascts 4 " Ee sols he dicatceds. sl) ofiehaalegicls shsaieak asa sn ieee 
765 | around P 153 | 9) 3 12}10}24) 2] 2)..) 1) 9}..) 4)..] 4) 8] 1) 1) 5} 2) 3) 1) 14/25/13 
7 | arouse 7 sBa ole cto she alle 8 Se ie oie cRCORS oils SBa 0s ORG ses ae 
3 | aroused... 3 ; me HE Gl BN ee Se mS Se A ey 
3 | arouses...... 3 : oe re ie a ee ee ee aa ee & Oe oN 
1 | arousing...... 1 .|. oe OMY, GN OK OR Oe ef) a yeh 
1,645 | arrange....... 329 7| 5) 2) 3/36)35)25/20)10)..| 9} 3). .{14)..| 3)12] 2) 6) 8/56) 8} 8} 23)34).. 
520 | arranged ee 130 5} 2 1) 5} 5) 4) 7}. .]..]00) 2...) U..] 1) 8)..] 2]. .]13) 8) 3) 24)29).. 
1,290 | arrangement. 258 | 7| 4|..| 3)22)12)17)12) 2).. 2)..] 8|..} 3/21) 1) 6} 2/20) 3/11) 57)/42).. 
1,356 | arrangements 339 3) 3}... 21/14/18) 5| 3 .| 81..1 3}..]../61]..] 3) 2/15) 2) 5)144/29).. 
1 | arranges...... 1 p eS RS dlleale siecle alinstn cits ciiveintll Ie dnemalialh 
148 | arranging..... 37 1} 1j..}..] 1 1) 6..) 5)..]..] O..] a. 3}..] 1j..] 3) 3) 2] 6) 2).. 
42 | arrears ‘ 21 _ op 5} 2) 6)..] 7. .}..]..]..]..]-. ee Ee a We di ales Ben 
1 | arrester... 1 = oe oh A Y- “ee - = 5 <n Rey A Me 
3 | arrestors.... K y a oe a ~~, & | ee Tt Re) oe Be 
496 | arrival. . a 124 | 5} 1j.. 6} 3)21 3] 1) 3} 2) 1) 3}..) 1) 6) 1)37| 7) 7]..)..] 7) 9.. 
396 | arrive....... 99 | 4 2/.. 11} 8} 7} 1 1] 3} 1) 1)..} 1] 5}..] 6} 1) 6]. .] 3] 19)19).. 
596 | arrived....... 144 | 5) 2 1)14)11/12)..) 1) 2) a)..}..] 2)..] 1/10). .] 4) 3/11) 1) 4) 34/25)... 
183 | arrives...... 61 3] 3 1)10 5).. eee t ee A ee 
90 | arriving.... 30 1) 1)... ..| 3] 2) 1 el Wes a Oe 8} 1} 2) 4 6). 
1 | arsenate. . 1 | se ae We es #3 ee 

' 








Table II represents a portion of the alphabetical list of the final 
compilation of the vocabularies of 26 different types of business. You 
will notice that the word arrestor was found in but one type of busi- 
ness, while arrange was found in almost every type of business. No 
doubt if more correspondence had been analyzed it would have been 
found in every type of business. It is clear that this table not merely 
gives a measure of the frequency with which a word is used in business, 
but also a measure of the universality of the word in the various types 
of business. The final credit which the word received was obtained 
by multiplying the number of times the word was found in all types of 
business by the square root of the number of different types of business 
in which the word was found. This weighting was adopted in order to 
give more credit to words occurring a few times in each of a large 
number of types of business than to words occurring with the same total 
frequency in but one or two types of business. 


5—33363 

































































| | 
- zi 8 I Pa * earn iaca teree die poysoise 
& L - - I , - ‘I [ocee tee eeeeeeeee tee eeewenere 480118 
- - I I . ° 2 ee ee -ezvesie 
3 & g Ma es Cee Lee 
=) 9 +. fr Z 9 rereee eee ee es» Su Suese 
a Zz “* Z Z ee soduviie 
fy zg I 91 9 €I 9 or BRIE arta haainn ciate 4 “syUSUTEZUBIIE 
o 6Z Zz ZI I eg I I I 8 ee -** -qu9ua8uess8 
3 6¢ va i 6 6 c I e Cl 7, en ee -peduesse 
= SOI g 1g 8 ZI I I c It 2k osccsnsteosaeso6ee bees 66 60a o8uviie 
oO I I eee *. “+e . . *- . . | I oe eee ewe eens . *Sursnoie 
DM 8I I 9 I | ¢ I I ¢ | 9¢ [roceteeeeeceeeeseeeeceers ‘pasnoae 
= £7 é. ZI o* I | ied | _ bee OI OF sbhate ees se “**** 9gnorR 
da Z61'T cE | SLb | 66 | $0r 0€ | 29 Ig | ove | gs SA etki ae eon oi ‘*punore 
& I shvee 5 | @ z he ? is | eileen, od cecal Ra ttle til idee ** 98018 
z, 3 as | a; : ta : | @ Z **BUI0IB 
S Lg F [ae g | I ol bl corse KOUB 
c ce Eee APO OP aN ee BRE OO 
= 1839.1 y ON OS | VN | AN |} YS | M | d@ | [ooL 
P peyysiemuy) | pozysoM | 
sy | | | | | | | | 
. ———————————E——— — 

















N 
OD 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 33 


Table III displays a section of the final tabulations of personal 
correspondence. You will see from this table that each section of the 
United States is represented. NE stands for the New England states, 
MA for the Middle Atlantic States, NC for the North Central states, 
SC for the South Central states, P for the Pacific states, W for the 
Western states, and SA for the South Atlantic states. From this in- 
vestigation it is possible to tell whether or not a word is used in all 
parts of the country, or whether a word is merely a local word. The 
items in this table are weighted according to the same plan as that - 
used in the preceding chart. 








Zz 
S 
= 
< 
o 
> 
a 
> 
fs 
3 
. 
2 
S 
< 
o 
N 
fa) 
< 
= 
fxs 
=) 
sj 
= 
<a 
4 
= 
5 
i 


= 
ina) 


ZI 
€1 
ia 


ia 


N oO oO 


°N 


1Z 


| 


FG 
9I 


al 


8 


- 
WYN/UPV 


LaGHS AUVAANOG 


9 


G 


FOI 


48 


A 


== 


PCE 


9€ 
OF 


16¢'€ 


9Z 


N 


a 
ia 


IVNIY « 


SF 
£6 
99 
¥ 


él 


19% 
Gl 
z 


G 
16¢ 


WI 





| SPI : , Sulsuvs11e 
I : : seh sosuvsie 
gece ‘T SJUDUIOFZUBLIG 
062 ‘I | 
ozo si" eae pesuviie 
CFO ‘T osuBLIe 
ns REVS Wav ewes jUNUTUBTeIIe 
: "3 : usTBLIB 
Sursnoie 
sosnoie 


}USMTOZUBIIV 


oO OD ms 


pesnoie 


ar) 


a9snoiB 


'~ 


*** punore 


a 
t' 


o 
-« 


o9sol1e 


_ 


ory BUlOle 
BUIOIB 
AULIG 


. 
= 


a | oyIpuIOy L, 


40 HIMNVG “Al WIV 





— 
a 


u 











CONFERENCE ON EDUCATIONAL MEASUREMENTS 35 


Table IV shows a section of the final tabulation sheets. In this 
table, B stands for business correspondence, LM for the letters of lit- 
erary men, P for personal correspondence, A&R for letters of applica- 
tion and recommendation, N&M for material contributed to newspapers 
and magazines, M for the vocabulary of minutes, E for excuses sent by 
parents to teachers, S for the letters of one superintendent of schools, 
and C for the composite list. The totals are the sums of the frequencies 
of the words in all these various investigations. At the left of the 
word you will notice the location of such words as are found in the 
Thorndike list. You should keep in mind that Thorndike did not re- 
port separately certain derived forms. His statement is as follows: 


“It should be noted that, except for special reasons, separate en- 
tries are not made of plurals in s; plurals where y is replaced by ies; 
adverbs formed by adding ly; comparatives and superlatives formed by 
adding er and est, or r and st; verb forms in s, d, ed, and ing; past parti- 
ciples formed by adding n, and adjectives formed by adding n to proper 
nouns.” 


He adds, however, “. . . this is not a spelling list. If it is used 
as an aid in the construction of spelling lists, the derived forms in s, tes, 
ly, er, r, est, st, s, ed, d, ing, and n should be inserted. They may offer 
notable difficulty in spelling even when easily read and understood by 
derivation.” 

Haying gotten the totals after the manner indicated in Chart 4, the 
words were arranged in order of frequency. You can readily see that 
these final summary sheets contain data for applying each of the five 
measures of importance which are outlined at the beginning of the lec- 
ture: namely, frequency, universality, permanence, quality, and cru- 
ciality. It is therefore possible to determine the relative importance of 
the 38,162 different words which were found in the investigation. 

The question of how many words to include in the course of study is 
somewhat more difficult. It can be answered in part by an examination 
of the summary tables in order to determine the limit beyond which the 
returns from teaching are likely to diminish so rapidly that the relative 
value of additional words would not be sufficient to keep them in a 
course of study. 

There is another practical way of determining the answer to the 
problem. It has been shown that with the best of modern methods and 
books, a little over 4,000 words can be taught by the end of the eighth 
grade with practical perfection. By practical perfection is meant a spell- 
ing average of from 95 per cent to 99 per cent. Somewhere between 4,000 
and 5,000 words would therefore seem to be the largest number of words 
than can safely be taught in the basic list for the eight grades. 

The question comes as to whether the list should be further re- 
stricted and the time for spelling reduced accordingly. This question can 
only be answered by determining whether the time thus saved would 
be as well used in teaching some other subject. Since the time now given 
to spelling is moderate, it seems likely that there should at present be no 
further reduction. The public already suspects the schools of sacri- 
ficing fundamental subjects for untried subject-matter which it is likely 






36 BULLETIN OF THE SCHOOL OF EDUCATION 


to designate as fads. Moreover, justly or unjustly, the public still im- 
poses a severe penalty upon those who misspell words. This penalty 
may even amount to the loss of an opportunity by an applicant for a 
position. Under the circumstances, it seems advisable to keep our basic 
course of study in spelling at from 4,000 to 5,000 words. 





What Should Tests Measure? 


ERNEST HORN 


IN teaching any subject it is of the utmost importance to be able to 
tell the degree to which we have really accomplished what we set out 
to accomplish. This is important for several reasons. First, it shows 
the pupil the efficiency with which he has studied. It shows him the 
degree to which he has accomplished what he set out to accomplish. Al- 
tho this value is often neglected, it is one of the most important uses 
which tests serve. No real morale can be maintained in a school in 
which the pupils take results for granted. Second, it shows the teacher 
the efficiency with which she has taught. Subordinate to this value are 
two functions which proper testing may serve: (a) A test may show 
the value of a text; (b) It may show the value of any given method 
which the teacher uses. Third, it enables us to compare the efficiency of 
one school with the efficiency in the country at large. As will be shown 
later, this comparison must always be made with great care. If the 
comparison is properly made and in the right spirit, it is always stimu- 
lating to ask, “If the country at large can reach this standard, why 
cannot this school do it?” 

We do not believe in education by faith. We want evidence of re- 
sults. This evidence must be gotten thru some sort of appraisal. This 
appraisal is usually in the form of a test. What are the guiding prin- 
ciples by which one may direct his efforts to determine scientifically 
how well the school has accomplished what it started out to accomplish 
in any given subject? 

There are two principles to keep in mind in testing any subject. 
First, the test must keep within the limits which are useful in life out- 
side the school. There is no value in subjecting a child to a test on 
what he ought not be expected to learn. There is little value in sub- 
jecting him to a test on what is relatively unimportant to learn. Sec- 
ond, the test given during the term should be limited to the purposes 
set up by the teacher and pupils to be accomplished during that term. 
It is obviously impossible to test the efficiency with which pupils studied 
in a given year or term by examining them upon items which they did 
not study during that term. It is obviously unfair to judge the ef- 
ficiency of a teacher’s work during a term by testing her pupils upon 
items which she was not expected to teach during that term. 

For the purpose of illustration, let me apply these two principles 
to the testing of spelling. I use spelling as an illustration for four 
reasons. First, we have reasonably adequate knowledge of the words 
which should be included in the course of study in spelling; second, 
perhaps in no other subject are the tests so satisfactory; third, it is 
easy to see how the two principles have been violated in testing this sub- 


(37) 








88 BULLETIN OF THE SCHOOL OF EDUCATION 





ject; fourth, there is probably no other subject in which the abuses of 
these two principles have been so glaring. 

Consider the first principle. Stated positively it is that term tests 
in spelling, whether from a standard scale or otherwise, must contain 
only words of unquestioned social utility. This social utility is now 
adequately known as the result of research. The measure of social 
utility to which the speaker will refer in this lecture is a compilation 
containing all the investigations of written correspondence which were 
completed up to January, 1923. This compilation contained about 800,- 
000 running words. Since that time an investigation under a grant 
from the Commonwealth Fund has added enough so that we now have 
tabulations of over 5,000,000 running words of varied writing needs. 

Perhaps the best known of the various spelling scales is that made 
by Ayres. In making this scale Dr. Ayres attempted to select the 1,000 
words of highest social utility. We now know that about two-thirds of 
the words in his scale are not among the 1,000 commonest words. On 
the other hand, more than 95 per cent of these words are important 
enough to be included in the elementary school course of study. Some 
of the later scale-makers have not been so particular about utility of 
the words in their scales. I should like to repeat that the percentages 
which I am about to give are based on all of the vocabulary research 
completed up to January, 1923. These percentages probably will not 
be greatly changed by the research which has been completed since that 
time. 

Slightly more than half of the words in the Buckingham exten- 
sion, about two-thirds of the words in the Stanford Test, and about two- 
thirds of the words in the Sixteen Spelling Scales are important enough 
to be included in an elementary school course of study of 4,000 words. 
Many of these scales contain words which do not fall among the 20,000 
commonest words. The Ashbaugh scale, altho it contains 3,000 words, 
ranks highest in the percentage of useful words which it contains. 


Taste I. Summary or Socta, EvaLuaTion or Various ScaLes BASED ON 
CoMPILATION OF ALL STUDIES BEFORE 1923 


Name of Scale Found in First 4,000 Not Found at All 
po US i SS 9 eer 97.5 0 
NG its pe eae F ado ‘righ sth 95.5 1 
RE acl oasis @ fata wees ey 67.0 20.0 
A RR aie Srey A rere) ama 62.9 17.7 
Buckingham Extension............. 53.7 25.7 


Table I shows the extent to which the various scales are limited to 
words which are important enough to be included in a basic spelling 
list. It should be obvious to everyone in this audience that if a 
sampling is taken from a spelling scale for the purpose of measuring 
the efficiency of a school system, the test will be unfair in the propor- 
tion in which it contains the words which have little or no usefulness. 
Such words should not be taught and therefore should not be tested. 
Regardless of the efficiency of teaching, a school which teaches its chil- 
dren only words of importance will make a poor showing when meas- 











CONFERENCE ON EDUCATIONAL MEASUREMENTS 39 


ured by a test containing a large percentage of words of little or n 
importance. 
Consider the second principle, which is that the efficiency of the 
teaching or study of spelling in a given term cannot be measured by 
using words not taught during that term. Table II illustrates this: 


Taste II. Grape OccuRRENCE AND SocraL Utiiry cr SAMPLE Test Sca.e B, 
SPELLER A-B 


Word Speller A Speller B Compilation 
EI nics = oo dc cnreie dec eie ame Ria 0 seas 0 
GE occ Fe waactaatcaevioeartetes 0 0 
WN <5 cron op ba aad eases 0 Ae 4a 
QUE hs ch oes oa tos Mae 7 8b 2b 
ee ES SE ee oe. ee 0 5b 
re eS ERE Saree eee nen a Lane te Se 0 0 
Crk ook 8 Ce hie beek e 7 4a 
WE 4) 2.51.5 nc cnug agseubeadar tenes 6 4b 
SN. a Stn stg cdmcsnacese eee eee 0 0 
| SEARS Cae eae e oer SN Oe RR aye 8 3a 


Standards: Grade 6, 58; 7, 73; 8, 84. 


Because of the limited space on a slide [here the author showed 
slides], this test contains only 10 words, but the slide was made from 
a longer test which showed the same sort of thing that this selected 
test shows. Suppose that an eighth grade studying Speller A were to 
be tested by this test. You will see that only one word in the test would 
have been studied in that grade, and only four words in any preceding 
grade. So far as the effect of teaching is concerned, the eighth grade 
teacher could not profit from her efforts except in the case of the word 
grateful. Suppose the test were used in the first half of the eighth 
grade in a school which uses Speller B. Then no word in the test would 
have been taught during that term. 

In effect this means that you say to a grade, “You haven’t studied 
your work very well this term because you can’t spell the words you did 
not study.” It means that you say to the teacher, “You did not do a 
good job of teaching spelling since your pupils cannot spell the words 
you did not teach them.” In the case of Speller B, the pupils may spell 
correctly all the words they did study and yet be unable to spell any 
given word in this test. Both pupils and teacher have a right to resent 
a test sampled from a standard scale without regard to the words which 
make up the course of study for the period which the test is supposed to 
cover. It is disheartening to do a good piece of teaching, study, or 
supervision and have the results obscured by such a method of testing, 
and yet you know that schools have been praised or damned by tests 
which were quite independent of either the course of study in the school 
as a whole or the course of study in the various grades in which the 
tests were given. 

Summarizing, such tests mean that the tester says in appraising a 
school system, “The spelling in this school is really not what it should 
be; first, because the pupils cannot spell the words they should not 











40 BULLETIN OF THE SCHOOL OF EDUCATION 


know how to spell, and second, because the pupils in a given grade can- 
not spell the words that have never been learned.” It is hard to see 
how a practice so obviously absurd has continued so long as it has. 
Contrast this method with a properly selected term test. 


Taste III. Grape 3—Berrore AND AFTER TEACHING 


September January 

Per cent Correct (Before) (After) 
tak ateawilatdec os 
Pe NS i os ot ddd See sos bin vas = Raa eae T en dORSe RERCESRR ES oh ods cous neha eeee 
a en eee eit 2 on 
DF Ess vac wrnt ec aehel etaes 
BP Rais a oc O OL vlna se LES HOE Ade D ba es ed BA eae oe os as ee ee 
EE SON SOA ar : Pah od Kpaeeen aw wees whee 1 
PINS. neVoree be dicnesd eee 
ris Ore Pry 
SE idaeen goede he ie 
RR Ree nee 
aso ote, 5 ce eee eel Bente eS 
Ee _ aR iii ty ae as 
SPN: re en nie ses ; canik det tp ebcecreees 2 
C= G8... Seeks eke 
PPR Geis A 
WIS S250 bs dot veel 3 
Se aps castle Pt Leia Alas oes 
SP os ds tude he eee te 43 ob otip eek cate sant abana 4 1 

2 

2 


ge) ee er rae ee ‘ 
BRM Si sinned ash 
Ss pb ak ee sb ibedes Bei LGSohwe Uhohud be Pa Jere arly adi tese teas 10 


Table III shows the result of a preliminary test made from sam- 
pled words about to be taught for a term, and a final test made by 
using a similar method of sampled words which were taught during 
that term. You will notice that standard spelling scales were used to 
discover the standard difficulties of these words. Doesn’t this give a 
much more encouraging picture to the pupils, the teacher, and to the 
superintendent? The test, being a fifty-word test, is a rigorous one. 
Those pupils who have done their work effectively discover that fact by 
the tests. 





wa a ey. ~~ 





CONFERENCE ON EDUCATIONAL MEASUREMENTS Al 


Taste IV. Grape 6—BeErorE AND AFTER TEACHING 


September January 
Per cent Correct (Before) (After) 


eee eresones 


Table IV shows the same kind of improvement in the case of a 
sixth grade. Such a table gives a clear measure of the progress of 
the grade and of the individuals in it. You will notice that the classes 
for which these slides were made were rather small, but the results 
obtained have been equalled or even surpassed by public school grades. 


TABLE V 
End-of-Term Grade 6 49 Pupils 
50-Word Test Standard Difficulty 74.5 
Per cent Correct Frequency 
80 - 84 3 
85 - 89 7 
90 - 94 11 
95 - 99 12 
100 16 
Lower quartile...... 92 
pO ere 98 
Upper quartile...... 100 


All pupils above standard 


Table V gives the results of a test given at the end of a term in a 
grade of 49 children. You will notice that the percentages here com- 
pare very favorably with those obtained in smaller classes. These pupils 
knew what words they were to study. They studied them rigorously, 
using an effective method, and at the end of the term made scores 
which justified to them, to their teacher, and to the superintendent the 











42 BULLETIN OF THE SCHOOL OF EDUCATION 


time they had given to this subject. Such testing increases rather than 
disturbs the morale of the teacher, pupils, and patrons. 

Summarizing, there are two principles which must be kept in mind 
in selecting a term test. First, it must not contain words which pupils 
do not need to learn, and, second, it must not contain words which are 
not taught during the term over which it is a test. 

What I have said does not mean that standard tests have lost their 
usefulness. They have two important functions: first, they enable the 
teacher to make preliminary and final tests of known difficulty; and, 
second, they enable the teacher to compare the results of her teaching 
with those obtained on the average in the country at large. 

The principles which I have urged in spelling apply to every school 
subject. In reading, arithmetic, geography, and composition the pupils 
of a given grade are not infrequently measured by a test of abilities 
which are either not very important or which were not meant to be de- 
veloped in the term’s work which the test is supposed to measure. I be- 
lieve very strongly in tests. I do not believe rigorous scholarship is 
possible without them. But in order to get the greatest educational 
value from a testing program, one must make certain that the test em- 
phasizes, first, those things in a course of study which are worth while, 
and, second, those things which are supposed to have been studied in 
the period covered by any given test. 








A Study of Handwriting in Forty Indiana 
Cities 
(A Preliminary Report) 


W. W. Buack, Professor of Elementary Education, and JOHN DALE 
RUSSELL, Graduate Student in Education, Indiana University 


THE study herewith presented has been carried on under the direc- 
tion of Professor W. W. Black by the Bureau of Codperative Research 
during the preceding seven years. It represents an attempt to analyze 
the progress thru the grades in speed and quality of handwriting of a 
fairly constant group of children. 

The method of the study is in decided contrast to the usual tech- 
nigue of taking a cross-section sampling of all the grades at a given 
time. Instead of this usual procedure, in the study now being re- 
ported, samples of handwriting were secured from pupils enrolled in 
the 2A grade in May, 1917, in 40 Indiana cities. In May of each suc- 
ceeding year since 1917, samples were secured from one higher grade 
than had been sampled the previous year. Thus the 3A grade was 
sampled in 1918, the 4A in 1919, and finally the 8A im 1923. 

It is not our intention in this report to discuss the relative advan- 
tages of the continuous group method as opposed to the cross-section 
method. Since, however, the method used in this study is different from 
the rapid-fire cross-section method, it is hoped that certain factors per- 
taining to handwriting may be shown in a new and different relief. 

The plan of sampling advancing grades in the succeeding years was 
evolved with the hope of securing samples from a fairly satisfactory 
number of individual pupils for the whole course of the experiment. 
From a check of our samples it develops that we have such data from 
275 individual pupils. One might at first be astonished to think that, 
from an initial sampling of 5,700 cases in the second grade, only 275 
pupils, or about 5 per cent of the total, would be sampled in all succeed- 
ing seven years. 

Besides the ordinarily thought-of factors in the school system, such 
as non-promotion and elimination, which would tend to cut down the 
number of continuous cases, we have several other factors to deal with, 
particularly non-attendance on the particular day on which samples were 
secured, failure on the part of some school systems to get any samples 
at all in certain years, double promotions, and possible failure of cer- 
tain teachers within a system to sample their classes. 

However, altho we might be at first somewhat disappointed at the 
few cases having continuous samples, yet the number of 275 is suf- 
ficiently large to base rather valid conclusions upon. As a matter of 
fact, the number of cases is large enough that the norms were not 


(43) 














44 BULLETIN OF THE SCHOOL OF EDUCATION 


materially affected by drawing this sampling of 275 cases out of the 
original number. A table will be shown later giving evidence of this 
fact. 

Uniform directions were given for the securing of samples. Uni- 
form blanks, upon which the children were to write, were distributed to 
all the cities; and every child present in the grade to be sampled on the 
day when the test was given was expected to be sampled in each of the 
cities. For one reason or another, certain cities were unable to secure 
samples during some of the years. 

Beginning with the second year of the experiment, the handwriting 
of the teacher was sampled along with that of her class. All together, 
about 33,000 samples of handwriting were secured. 

After the samples had been secured in the last year of the study, 
all the samples were scored by one person, using the Gettysburg Edition 
of the Ayres Handwriting Scale. By having all the scoring done by one 
person, whose scoring we are reasonably sure was rather consistent, we 
are able to make fairly accurate comparisons among the different groups 
included in the study. We have not as yet ascertained the validity of 
the scoring estimates, i.e., we only know that the scoring is consistent 
within itself. We do not as yet know whether it is too high or too low, 
based upon average estimates such as are given in the standard nornis 
in the Ayres Scale. This fact will be determined and shown in the 
final report. For that reason in this report no effort will be made to 
compare the standing of Indiana cities in handwriting with standard 
norms. As a matter of fact, the large number of cases upon which our 
averages are based in this study is almost sufficient ground to warrant 
our assuming these averages as standard Indiana norms, provided the 
validity of the scoring had been ascertained. 

Table I shows the averages by grades of the quality of handwriting 
of cities participating in the experiment. A reference to the last line 
of figures in Table I shows the distribution of these cases among the 
various grades. It should be noted that the study started with 5,733 
cases in the 2A grade and ended with 3,348 cases in the 8A grade. 
These figures are given to show that averages are based upon a very 
large number of cases. The number of teachers from whom samples 
were secured varied from 194 in the 5A grade to 48 in the 8A grade. 
In the last line of Table I are given the final averages for the grades 
as a whole. These averages are not the averages of the mean figure for 
each city, but are based upon a separate compilation of the scores made 
by all of the individuals in each of the grades, irrespective of their city. 














CONFERENCE ON EDUCATIONAL MEASUREMENTS 45 


Ayres Scale.) 

Ga sa oe to ae bttcne e wewe 2A 
Year sampled............... 1917 

ARE Sinn bs bodecdceases 33.6 

ins Seta ay soccdseedus 28.4 
Ps wt Gra baclenucaewe 32.7 
on ee FS ee 27.5 
CommOROVEN®s ie ce ccccesne 27.9 
Crawfordsville. ...... 27.2 
ER one dine canadienne 

yo ee 32.6 
St SR Sere 27.1 
aaa 33.8 
RE ree 28.2 
PINS. ok cc rdbewens 22.9 
DOs dedi ccedseceGtecs 26.9 
| eee 32.2 
GO sakes Sapien tons ccee 





Hartiend City. ......ccccccce 30.3 
Indiana Harbor. ‘ 2s 
ER idccns be cececkese 24.1 
ER ee 28.1 
SM a oe ck cbs wenbecuse> 30.4 
LN end Be erie ch capienss 29.8 
Re Ae 38.4 
Se ee es 30.3 
Sie Sao high onctins 26.2 
RT pe = 
Michigan City............... 28.0 
* a arr 30.8 

a SSE AAS SRE Se. 29.1 
3 ae eee 30.0 
PUIG 5 a <s.onis's's base Vien 28.1 
PI iin is s0ksavkescn 33.3 
Das pata c aad sake 32.1 
Sia h odie ccccbencere’ 31.9 
ee 33.8 
oo)” eee ee 35.1 
sas Sckinieemdddsces 37.7 
:: pera 27.9 
J eae 25.1 
i  £e gee 29.9 
WONG beds un iddiearnkennes 30.8 
WIL gnc sccccccdaccact 36.0 
LE Re na ede 
Average of al]. ............0 29.8 
Number of cases............. 5,733 


Taste I 
Hanpwritine Stupy—Darta Compmep Apri, 1925 
Averages by Grades of Quality of Handwriting for Cities participating in Experiment. (Scored on 


3A 
1918 
37.8 
36.0 


we 
= 
° 


oo 2 no wo 
SSSFF BS 
tee eS aan: © SOSSS NRSOHDMR WO 


:BSSt BER: & etese 


Sous gee: g 
CcouUrwm woo: 


RSSS: 
orwom: 


@- 
ag 
—) 


36.9 
5,437 


4A 


43.0 


4,919 


5A 
20 


QO > ee 


BSSRE 


Oe me OO 
OOM OA OO OO SI OOH OOOO 


SSSS5 REESE 


[838 Bee: 8 Bees: 
‘CFS CON S@ Sone: 


47.9 
4,905 


6A 
1921 
55.0 
64.7 
46.3 
39.6 
45.5 


SESRS 
> Sto PA wOr 


: BBSS 


41.4 
46.8 
35.8 
39.9 
36.8 
42.1 


46.5 


4,425 


7A 8A 
1922 1923 
49.8 53.4 
61.0 se 
55.9 57.6 
41.7 = 50.1 
51.8 49.4 
50.3 53.9 
45.6 41.9 
47.1 pes 
47.8 = 51.1 
48.3 46.3 
59.3 62. 
57.4 57.6 
55.9 87. 
57.6 64.7 
44.6 86 44.4 
mY 40.3 
47.1 af 
54.9 60.4 
58.0 66.9 
54.1 67.8 
54.3 64.6 
498 56.9 
36.2 48.3 
39.4 8644.3 
iy 56.8 
47.9 45.8 
48.5 43.7 
465 443 
51.6 = 57.7 
48.6 52.0 
51.2 = 52.3 
41.1 45.3 
42.9 48.0 
56.1 
50.8 53.2 
3,866 3,343 


The cities marked * began the experiment in 1916, a year earlier than the other cities. 


It is not our purpose at this time to enter into any discussion of 
the variations that are found between cities in Indiana. 


A mere glance 


at the table is sufficient to show that a wide variation exists. These 
data are presented in order that anyone interested in a particular city 
may be able to find the position of that city with reference to the aver- 
age quality for the whole group of cities. 











46 BULLETIN OF THE SCHOOL OF EDUCATION 


Taste II. Hanpwritina Srupy. (Data compiled April, 1925.) 
Averages by Grades of Speed of Handwriting for Cities participating in Experiment (Scored on Ayres 
Seale). 


Grade. . om a 1917 1918 1919 1920 1921 1922 1923 
Year sampled 2A 3A 4A 5A 6A 7A 8A 
*Anderson.. . ; 17.7 36.5 54.9 56.6 61.7 84.9 92.1 
*Bedford..... ie. ae 40.1 46.8 48.8 63.2 83.6 
Bluffton 21.1 32.6 53.3 39.9 73.9 $4.2 71.6 
Columbus - 31.9 31.3 39.5 45.4 50.1 61.8 67.7 
Connersville 39.0 44.7 54.4 58.0 112.4 92.8 87.9 
Crawfordsville : 30.7 39.9 58.0 74.1 93.4 89.7 97.0 
Decatur és . 52.9 53.8 75.6 87.8 84.0 100.6 
*East Chicago ee | 24.2 33.3 36.6 46.7 88.8 
Elkhart. . ‘ . 29.3 40.6 43.6 49.8 73.5 74.3 84.1 
tlwood. . . ~— 25.9 28.8 53.0 53.7 80.5 102.4 91.6 
Evansville. +e - 3.1 35.9 61.2 69.5 al - 
Fort Wayne ee ‘ 30.1 47.0 50.3 55.7 73.3 85.6 88.1 
Frankfort. “ ° 24.2 42.9 43.2 44.8 58.6 60.8 74.2 
Franklin... ‘3 18.3 31.1 41.7 57.8 47.8 69.2 87.8 
Goshen..... ; oak ene 44.4 55.9 59.9 76.1 81.9 
Hartford City 24.4 42.2 57.9 57.2 62.0 78.8 100.0 
Indiana Harbor 13.9 37.3 31.7 48.8 ‘ 
Kendallville er 35.6 34.1 77.4 71.7 87.4 
Kokomo 22.0 35.4 47.5 50.5 75.6 69.3 
Lafayette 35.4 33.8 45.9 54.5 76.1 78.2 
Lebanon.... 23.0 35.3 49.0 coal 66.8 93.9 92.1 
Logansport . 29.3 29.9 39.1 48.4 51.4 74.9 80.6 
Madison... 24.0 38.6 42.5 47.7 76.0 91.5 104.6 
Marion 29.1 48.8 53.3 69.5 69.8 60.5 83.3 
Martinsvilie . as ; ve 67.2 56.4 
Michigan City 31.2 54.5 55.6 66.9 81.4 91.4 82.5 
Mt. Vernon 38.8 we , 
*Muncie. . a 17.8 28.2 43.2 34.3 57.4 70.9 79.9 
Peru. . Seas : 31.3 35.8 47.1 65.6 : 
Plymouth ; . %.o 54.9 * 58.0 71.4 82.0 
Princeton. . 36.8 46.2 55.6 63.6 65.0 70.7 85.1 
Rushville. . . a Se 40.9 58.7 57.7 84.0 88.2 11.4 
Salem , 24.0 42.8 55.1 49.8 23.1 62.2 81.0 
Shelbyville 24.9 36.2 P 
South Bend... . 38.0 46.1 58.3 79.7 83.7 95 
Sullivan. . 24.2 . é 47.5 88.5 11.1 
Terre Haute 29.1 39.5 43.8 46.3 69.6 80.2 84.5 
Valparaiso... 35.7 57.5 60.5 65.0 106.8 86.1 89.7 
Vincennes. . = 28.0 37.4 33.3 51.9 main ad 
Wabash... 26.5 40.1 51.3 59.3 73.0 85.0 95.4 
Washington... ; a 33.9 ee 44.3 56.1 59.4 anne 75.2 
Whiting. .... ‘é ; at Want 40.4 ee sea ra edie 
Average of all 27.9 44.3 49.2 55.4 68.7 80.3 90.6 
Ayres Norms 31.0 44.0 55.0 64.0 71.0 76.0 79.0 
Number of cases 5,733 5,437 4,919 4,995 4,425 3,866 3,343 


The cities marked * began the experiment in 1916, a year earlier than the other cities. 


Table II shows the speed of handwriting in the same manner as 
Table I shows the quality. The speed is based upon the number of let- 
ters written per minute in accordance with the directions on the Ayres 
Seale. In this case it is possible to make direct comparisons with the 
norms given in the Ayres Scale, which are given in the next to the last 
line of the table. It should be noted that the curve of progress in 
speed as shown by the Ayres norm is not exactly the same as that 
shown by the Indiana averages. 

After the discussions at the Conference on Supervision a day or 
two ago, in which the word speed was given rather a bad reputation, 
particularly as applying to the subjects of arithmetic and reading, one 
should perhaps be somewhat hesitant to enter into a discussion of speed 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 47 


as a desirable quality in handwriting. We have not as yet completed 
the study based on these data of the effect of increased speed on qual- 
ity, and the effect of increased quality on speed. That section of the 
study will be completed at a later date. We do have, however, the 
possibility of comparing directly the curves for increase of quality and 
for increase of speed as they are shown in the grade averages. It is 
possible to make this comparison directly from the figures given in 
Tables I and II, BUt for convenience these averages are presented in a _ 
single table, Table III. 


TaBLe IIT. Comparison or AVERAGES BY GRADES OF SPEED AND QUALITY 
Basep oN ALL CasEs 








Grade | Average Quality Average Speed 
| ee Mes eet ae 29.8 27.9 
EL Reine Mie sa 36.9 44.3 
Rng eat ent mnegaraetr ai 27h 43.0 49.2 
RR ee east cet 47.9 55.4 
as es ese de hs 46.5 68.7 
ES tae ee Pe) See 50.8 | 80.3 
| See ez aot os We 53.2 90.6 





Table III is a comparison of the averages by grades of speed and 
quality based upon all the cases in the study. Table IV presents the 
same facts for the 275 cases having a continuous run of samples; i.e., 
these averages are based upon the 275 pupils from whom samples were 
secured each of the seven years of the experiment. 


TasLeE IV. Comparison or MEANS oF SPEED AND Qua.iry BAsED on 275 
Continuous CaAsEs 











Grade Average Quality Average Speed 
OC Kerra esi: cous 29.5 27.3 
OR ek Saas ok ee 37.6 42.4 
Oe. ae ee A hoe ed "43.5 53.6 
Gis ok hy A 2 mer 48.7 59.1 
oS, PR Re 48.4 77.7 
* eee Ap OE a es | 50.7 82.3 
OR ert Pe ee ae 52.7 90.5 


For the purpose of more graphic presentation these figures have 
been plotted as curves. 












BULLETIN OF THE SCHOOL OF EDUCATION 


SOL 


+ 


0 
a) © 


rt 

















La 3A YA 








| 
50% 


80h 


(Oh 


60 fF 





50 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 


CHART I 











is 


49 








ge 
Sa 4a OA 














50 BULLETIN OF THE SCHOOL OF EDUCATION 





Chart I shows grade averages for speed and for quality based upon 
all the cases from which samples were secured, i.e., the data shown in 
Table III. No attempt has been made to present these two factors in 
terms of comparable units on this chart. Thus, points on one curve 
have no reference whatever to points on the other curve. They should 
be compared only with points on the same curve. However, the re- 
spective directions taken by the curves are significant and permit com- 
parison. Perhaps the most striking feature of the curves occurs at the 
sixth grade. There is a distinct slump in quality at this point and a 
much less rapid increase in quality ever after. At the same time that 
this slump occurs at the sixth grade in quality, we have a large in- 
crease in speed, an increase which persists thru the following grades. 
We should not, however, arrive too hastily at a conclusion concerning 
the relationship between the speed and quality as shown by these data. 

Chart II shows exactly the same data for the 275 continuous cases, 
being based upon the data in Table IV. You will note again the pres- 
ence of this sixth grade slump. The slump in quality is not quite so 
pronounced as when all the cases are used, but the increase in speed is 
somewhat greater. The same plateau effect is evident in the quality curve 
after the sixth grade. The persistence of this trait seems to be evi- 
dence of its fundamental character. It is interesting in this connection 
to refer to the norms on the Ayres Scale for quality. In those norms a 
regular progression of four points on the scale is assigned to each grade. 
No difference is made between the increase shown by the sixth grade 
over the fifth and that shown by the third grade over the second, ac- 
cording to the norms on the Ayres Scale. Thus, the Indiana data show 
a situation rather widely at variance with the apparent intention of the 
Ayres Scale to suggest a regular and even progress in quality of hand- 
writing thru the grades, the intervals from grade to grade each being 
exactly the same in the Ayres norms. Other scales, such as the Thorn- 
dike and the Freeman, do not maintain an even progression in their 
norms, but the variation is very small and much more regular than that 
shown by the Indiana data. 

Furthermore, the increase in speed according to the Ayres norm is 
a constantly decreasing amount as progress is made thru the grades, 
i.e., the higher the grade, the lower the interval between its average 
speed and that of the grade immediately preceding it. The amount varies 
from an increase of three letters per minute between the seventh and 
eighth grade standards to thirteen letters per minute between the second 
and third. No indication is given of a burst of speed at the sixth grade 
level in these norms. 

The study of the Indiana data has not yet revealed how there might 
have occurred such discrepancies between the Indiana averages and the 
standard norms. As a matter of fact, there might be left an open 
question as to whether or not the situation as revealed by the Indiana 
cities really presents more accurately the actual situation than does the 
set of norms furnished by the Ayres Scale. 

Just as one bit of evidence on the question of reliability of the 
averages used in this study, Table V is presented, showing the variations 








— = 


= Ww 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 51 


in the averages for the Indiana data as secured by different methods 
of sampling. 


Taste V. Srapimitry or MEANS OF QUALITY 











II 

I Average Quality III 

Grade Average Quality | of 275 Continuous Average Quality 

of All Cases Cases of Class Groups 

CBs ake wes 29.8 Ree > Sn aes ee 

BA dss vetoes 36.9 37.6 37.4 
BE cpio 43.0 43.5 42.9 
Sh =. 47.9 48.7 47.8 
| eee ee 46.5 48.4 46.0 
Ae pao 50.8 50.7 52.2 
Wann 53.2 52.7 54.1 














In Table V, column I gives the averages based upon all the cases 
irrespective of their cities. These are taken from Table I. Column II 
gives the means of the 275 continuous cases, taken from Table IV. 
Column III gives the average of the means of the various classroom 
groups from which we have a sample of the teacher’s handwriting. 

As mentioned somewhat earlier in this report, an attempt was made 
to sample the handwriting of the teacher as well as that of her pupils. 
In many cases, however, the teacher failed to submit a sample. We 
do, however, have a fairly adequate number of classes from whose 
teachers samples were secured. The data in column III were compiled 
by calculating separately the individual class average from each one of 
these classes represented by a teacher’s sample. Then the average of 
these class averages was taken for each grade, and is here presented 
in column III of Table V. Ordinarily the averaging of averages is a 
rather doubtful statistical procedure. Considerable error is likely to be 
introduced if the averages are not based upon groups of the same size. 
Column III is presented here in relation to the other means, in order 
to show that no great violence has been done the data in thus averaging 
the average of the classes. 

Glancing for a moment at Table V, column II, and column III, we 
note that in every case except one the mean of all cases, presented on 
column I, lies between the means shown in columns II and III. The 
deviations of columns II and III are believed to be well within their 
respective standard errors. 

Let us look now at a third feature to be presented in this study, 
which is a comparison of the handwriting of the teacher and that of the 
average quality for her class. These data are presented in Table VI. 











52 BULLETIN OF THE SCHOOL OF EDUCATION 


TasLe VI. CoMmPARISON oF Qua.ity or TEACHERS’ HANDWRITING WITH AVERAGE 
Quauity or HANDWRITING OF THEIR RESPECTIVE CLASSES 


Grade 3A | 4A 5A 6A 7A 8A 





Number of cases... .. 
Average of all classes 


Pan oe ‘ 150} 186) 194) 159] 63 48 
area rs 37 .433)42 .925)47 . 886/46 094/52. 206/54. 166 


Average of classes where | | 
teachers’ qualityis:.........) 40 | n.c. |39.0 |46.33 /40.0 | n.c. 46.0 

| 50 |34.90 |40.33 /45.91 |43.57 |50.66 |50.0 

| 60 |35.23 39.4 [46.11 |42.07 |49.0 |48.66 


| 70 37.31 |42.76 |48.04 /46.56 )52.35 |54.33 
| 80 |38.18 |42.89 |47.72 |45.74 [50.0 [53.70 
| 90 |37.97 |43.88 |48.85 |47.60 |52.73 |56.87 


PR ORE AT CI a ip | i 
Average of all teachers. . . ewes (76. 666 ns he ee 158|79.792 


es WG (a Sie Ped 


























One should note first the number of cases upon which the data are 
based. Thus we have 150 teachers and 150 classes in the 3A grade; 
186 teachers and 186 classes in the 4A grade, and so on. The drop in 
number of classes in the seventh and eighth grade corresponds roughly 
to the expected drop in school population in these grades. The average 
of all classes is then given for comparative purposes. These figures 
are the same as those in column III of Table V. 

The next block of figures in Table VI represents an attempt to 
present the average quality of classes whose teachers have a given 
quality of handwriting. Thus, looking at the 3A grade column, we 
note first, that teachers who have a handwriting quality of 50 have 
classes whose average handwriting quality is 34.90; teachers who have 
a handwriting quality of 60 have classes whose average quality is 35.23; 
teachers whose handwriting quality is 70 have classes whose average 
handwriting quality is 37.31, and so on. It happens that in two grades 
(3A and 7A) there were no cases in which teachers had a quality of 
handwriting as low as 40. One can see at a glance from the chart that 
there is some progression in the quality of the class’ handwriting ac- 
companying improvement in the quality of the teacher’s handwriting. 
This progression is evident in all the grades. 

For purposes of more graphic presentation, the same data are 
plotted in Chart III. Chart III is broken into two sections, IIIa and 


IIIb, Illa showing grades 3, 4, and 5, and IIIb showing grades 6, 7, 
and 8. 





6 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 53 


CHART [7 >. 


~ 0 00 0 0 0 9 
"5 


To 








36- 


34- 








54 BULLETIN OF THE SCHOOL OF EDUCATION 





These charts are drawn to scale, the horizontal lines representing 
the average quality of the various grades, the distances between the 
grade average lines being based in each case upon the same scale. 

At the top of the chart you will note the figures representing quality 
of teachers’ handwriting. In each case the average quality of classes 
of teachers having a given quality of handwriting is plotted as a devia- 
tion from the average of all classes of that grade, the minus deviations 
being dropped below the average line, and the plus deviations rising 
above the average line. The amount of deviation is drawn on the same 
scale as that representing the difference between classes. 

A glance at this chart makes it very evident that teachers whose 
handwriting is below 70, i.e., 60, 50, or 40, on the average have classes 
who are to a considerable extent below the average for classes of their 
grade. There is not, however, a similar increase for teachers whose 
handwriting is above 70. In most cases there is evident a slight superi- 
ority for classes who are fortunate enough to have a teacher whose 
handwriting score is above 70, but this superiority is by no means as 
pronounced as is the inferiority if they have a teacher whose hand- 
writing is below 70. 

For instance, look at the specific case of the fourth grade classes 
whose teachers have a handwriting quality of 40. Their average is 
just barely above that of classes a year below them who are fortunate 
enough to have teachers with a handwriting score of 80. The chart 
is presented in two parts (IIIa and IIIb) because of the tremendous 
overlapping of the minus deviations in the upper grades. 














CONFERENCE ON EDUCATIONAL MEASUREMENTS 
CHART IZb 


40 50 60 8 7 


Sé- 


ay 
_ 


$0- 











55 














56 BULLETIN OF THE SCHOOL OF EDUCATION 


Chart IIIb gives another vision as to possible explanations of this 
sixth grade slump in quality. You will recall that our previous charts 
had suggested a relationship between the burst of speed which comes 
in the sixth grade and the slump in quality. 

Note, however, where the great bulk of this slump comes. Teachers 
whose handwriting is below 70 apparently contribute to this slump to a 
much greater extent than do teachers whose handwriting is above 70. 
On Chart IIIb, the fifth grade average is also ruled in so that com- 
parisons may be made as to the total slump from the average reached 
in the fifth grade. 

Perhaps the question which now arises in one’s mind is this: “How 
many teachers are there who have such low handwriting scores?” For 
that purpose the last line in Table VI gives the averages of all the 
teachers by grades, and on Charts IIIa and IIIb an arrow points in 
each grade to the average score of the teachers on a horizontal scale. 
Is it at all significant that the sixth grade teachers have the lowest 
average quality of the teachers of any grade? 

This question is raised for the express purpose of throwing some 
doubt onto the wisdom of laying the entire responsibility of the sixth 
grade slump in quality at the door of “speed”. It is entirely within 
the range of possibility that there may be several other factors which 
we have not yet discussed which may have a bearing upon the situation. 
It is interesting to note also that seventh and eighth grade teachers, 
grades where handwriting is ordinarily not considered such an important 
subject of instruction, have the highest average quality of handwriting. 

This whole matter of the relationship of the teachers’ quality to 
that of their respective classes is worthy of a much more profound study 
than our preliminary discussion of these data has produced. It would 
seem, however, that two or three things stand out. 

First, that the relationship is not rectilinear, being more pronounced 
at the lower end of the distribution than at the upper end. 

Second, it would seem that the critical point for the quality of the 
teacher’s handwriting is in the neighborhood of 70. Some of you will 
recall that Dean Breitwieser, in the teacher training conference last 
Thursday evening, from this platform proposed that eliminations frem 
teacher training classes be based to some extent upon the ability of the 
prospective teacher to write a readable hand on the blackboard. We are 
not sure that in sampling the teacher’s handwriting we have secured a 
fair measure of her ability to write on the blackboard. Assuming, how- 
ever, that there would be a high degree of relationship between the 
samples as secured and blackboard handwriting, we could then supple- 
ment Dean Breitwieser’s remark by stating that the critical point at 
which exclusion should be considered would be a quality of 70 on the 
Ayres Scale. This critical point should also be of interest to superin- 
tendents and employers of elementary teachers. 

The third point to which attention might be called in this matter 
of the relationship of the teacher’s handwriting to that of her class is 
the effective distribution of teachers whose handwriting is superior. 
This is not a matter that is settled by this study. It has only raised 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 57 


the issue, particularly in connection with the fact that the lowest aver- 
age teacher’s quality was found in the sixth grade, and that there was 
a concomitant slump in the average quality of classes. The presence 
of this question is also indicated by the fact that the highest average 
quality for teachers was found in the seventh and eighth grades. 
Going back, in summary, to the matter of the trends in speed and 
trends in quality thru the grades, the most pronounced findings of this 
study at present are a slump in the sixth grade in quality, which is 
persistent, no matter upon what selected groups the averages are based. 
Accompanying the slump in cuality at the sixth grade is a greatly in- 
creased speed which is also a persistent factor thru all the averages. 
The tables of averages presented in Table I and Table II are for 
study with the hope that reference to particular school systems will 


be valuable for purposes of comparison and possible effort at improve- 
ment. 











Suggestions on Value and Use of Accumulated 
Records of Group Intelligence Tests 


HERMAN H. YounG, Professor of Psychology, Indiana University 


THE purpose of this study is to investigate the regularity or con- 
stancy with which children may be expected to remain in a given section 
of a class when classified thru the use of group intelligence tests. The 
data consist of the results of four different group intelligence tests given 
to 96 children at different times within a period of 26 months. The 
tests and dates upon which they were given are: 

1st Test—Indiana University Schedule F—November, 1922. 

2d Test—Kingsbury Primary Scale A, Form 1—April, 1923. 

3d Test—Indiana University Schedule E—April, 1924. 

4th Test—National Intelligence Test, Scale A, Form 1—Janu- 

ary, 1925. 

This makes the interval of time between the first and second tests five 
months; between the second and third tests, 12 months; and between 
the third and fourth tests, 9 months. The first two tests are considered 
as non-language group intelligence tests, and the last two as language 
group intelligence tests. All tests have been standardized by the method 
of standard percentiles described in a paper presented two years ago 
at this conference by the writer.* Thru the use of the standard per- 
centile tables all scores were translated into percentiles. These form 
the basis for comparison between tests, between children, and for the 
forming of sections. 

The records of the 96 children which are reported in this paper 
were selected because all children received the above mentioned four 
tests in the order and at the time scheduled, and because they are all 
nearly the same age. In January, 1925, when the fourth test was given 
the youngest child was 9 years and 10 months old, while the oldest 
child was only 10 years and 9 months old. This brings it about that the 
greatest difference in age between the youngest and oldest child is 12 
months or only one year. 

The records of the group intelligence tests on these children were 
recorded and kept on the permanent record cards described at this con- 
ference one year ago.t In working up this report the following data 
were calculated for each child: 

1. Percentiles on each test. 





* “How to Interpret and Make Use of Mental Tests.”” Tenth Conference on Educa- 
tional Measurements, Bulletin of the Extension Division, Indiana University, Vol. VIII, 
No. 11, July, 1923. 

7 “How to Keep Usable Permanent Records of Mental and Achievement Tests.” 
Bulletin of the School of Education, Vol. 1, No. 3, January, 1925. 


(58) 








— st 


n 
st 
e 
2 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 59 


2. The median (or average) percentile for each pupil for all pos- 
sible combinations of two tests. 

3. Median percentile for each pupil for all possible combinations 
of three tests. 

The median percentile of two tests is the same as the average. The 
median percentile of three or more tests is that percentile above which 
and below which one-half the percentiles for each pupil lies. To find 
the median percentile, all the percentiles made by a child are arranged | 
in order of size from the highest to the lowest. When a child has an 
odd number of percentiles the middle one is taken as his median per- 
centile. The second percentile of those arranged in rank order is the 
median of three percentiles; thus, 11 is the median of 5, 11, and 13. 
The third percentile of the rank order arrangement of five tests is the 
median, thus 89 is the median of 54, 73, 89, 90, 92. When a child 
has an even number of percentiles the average of the two middle ones, 
when they are arranged in rank order, is the median percentile. The 
average of the second and third percentiles in the rank order of four 
percentiles is the median percentile; thus the average of 42 and 60, 
ie., 51, is the median of 41, 42, 60, and 90. The average of the third 
and fourth percentiles in the rank order of six percentiles is the median 
percentile; thus the average of 14 and 30, i.e., 22, is the median of 6, 
8, 14, 30, 31, and 70. The median was employed because it may be 
easily calculated, and because it has reliability and fairness greater than 
any other value so easily calculated. It is probable that a complex 
formula involving considerable calculation might produce statistically 
more reliable results, but would be too complicated to permit of being 
used as readily as is necessary in this work. 

The purpose of this investigation is to determine how records of 
group intelligence tests may be most reliably interpreted and used in 
the classification of children in the ordinary public schools. It is with 
this in mind that the 96 children have been classified by various methods 
to determine the stability of the sections to which they would be as- 
signed by the different methods of classification. In every classification 
the children were divided into three equal sections of 32 children each. 
The 82 children rating the highest by a chosen method were always 
assigned to the X section; the 32 children rating next highest by the 
same method were assigned to the Y section; and the other 32 children 
rating the lowest by the same method were assigned to the Z section. 

The children were first divided into X, Y, and Z sections on the 
basis of the first test given them. This is what would have happened 
to them had they been classified at the time the test was given. They 
were then divided into X, Y, and Z sections on the basis of the second 
test because this is the way they would have been classified had the 
results of the first test been disregarded or thrown away before the 
second test was given. They were then classified on the basis of per- 
centiles of the third test and again on the basis of percentiles of the 
fourth test, because this is what would have happened after each test 
had the results of all previous tests been ignored. 















60 BULLETIN OF THE SCHOOL OF EDUCATION 





This made it possible to determine exactly how may children were 
put into the same section by the second test into which they were put by 
the first test, and how many were shifted into other sections and into 
which sections they were shifted by the second tests. This same thing 
was done for the third and fourth tests; the first test was always used 
as the point of reference. 


Taste I. DistrRispuTIon or 96 CHILDREN IN X, Y, and Z Secrions By EAcu or 
Four Dirrerent Group INTELLIGENCE TESTS 




















| | 
| Test | Average of 
Test F Section } ass ae ae eee ee Columns 
, A | E | N A, E, N 
The 32 children put in Section X xX 23 21 20 22 
by Test F were put in sections asl) Y | 8 9 11 9 
follows by the other tests:.......| Z | 1 2 1 1 
The 32 children put in Section Y) xX | 7 10 11 9 
by Test F were put in sections as} Y | 14 10 12 12 
follows by the other tests: = +a 10 11 11 
een — = — Sere -_ | ES | 
The 32 children put in Section Z| X | 2 a 1 
by Test F were put in sections as| Y 10 um } 11 
follows by the other tests: | Z ee AS | 20 20 
| | | 





Table I shows how the children were redistributed by each of the 
last three tests from the sections into which they were put by the first 
test. Column 3 shows that of the 32 children put in the X section by 
the first test, 23 were put in the X section by the second test, 8 into 
the Y section, and 1 into the Z section. Column 4 shows that of the 
32 children put into the X section by the first test, 21 were put in the 
X section by the third test, 9 into the Y section, and 2 into the Z sec- 
tion. Column 5 shows that of the 32 children put into the X section 
by the first test, 20 were put into the X section by the fourth test, 11 
into the Y section, and 1 into the Z section. The last column of the 
table is an average of the three preceding columns. It shows that, on 
an average, 22 children remained in the X section, 9 were shifted to 
the Y section, and 1 to the Z section. The remainder of Table I shows 
what would have happened upon reclassification by the last three tests 
to the children assigned to the Y and Z sections by the first test. 























CONFERENCE ON EDUCATIONAL MEASUREMENTS 61 


X 3@--220 .11 41 





17 es 
\. Ps 
4 
. 3 io. f 
7 Ee 
Z sa ‘1 ae 


"ee —. eweaee ” 


Chart 1. Groupings by 
Single Tests 


The numbers from the last column of Table I furnish the data out 
of which Chart 1 is conducted. Chart 1 therefore shows the average 
result of reclassification of the children by the last three tests. The 
first column at the left of the chart gives the name of the section. The 
second column shows the number of children assigned to each section 
on the basis of the first test. The third column shows the average 
amount of redistribution of the children assigned to the X section by 
the first test. It shows that 20 of the 32 children put in the X section 
by the first test are on an average put into the X section by the other 
tests, and that 11 children are on an average shifted to the Y section 
by the other tests, and that 1 child is shifted to the Z section. The 
fourth column shows that 11 of the 32 children put in the Y section by 
the first test are on an average put into the X section by the other tests, 
and that only 12 children on an average remain in the Y section and 
that 9 children are shifted into the Z section. The last column shows 
that 22 of the 32 children put in the Z section by the first test remained 
on an average in the Z section by the other tests, and that 9 children 
were shifted into the Y section, and 1 up into the X section. The 
children remaining in the sections to which they were originally assigned 
are shown by the numbers on the diagonal, 20 in the X row, 12 in the 
Y row, and 22 in the Z row, making a total of 54 children. This is 
the average number of children who did not change from one section to 
another on the basis of different tests. It must, however, be remem- 
bered that whereas on an average 54 children remained in the section 
to which they were originally assigned, it was not the same 54 children. 

In the second method of classification the children were divided into 
X, Y, and Z sections on the basis of the median (average) percentile 
of the first two tests given them. This is what would have happened 
had they been classified after the second test was given providing the 
results of the first test had been kept and treated as of equal value 
with those of the second test in making the classification. They were 
then divided into X, Y, and Z sections on the basis of the medians of 
each of all possible combinations of two tests out of the total of four 
tests. There were six such combinations possible. Listed in order they 
are combinations of tests: FA, FE, FN, AE, AN, and EN. This was 
done because the children might have been classified by any of these 
possible combinations providing the results of only two of the tests had 





62 BULLETIN OF THE SCHOOL OF EDUCATION 


been available at the time of classification. The purpose of classifying 
them on the basis of two tests was to discover whether there would be as 
much interchange between sections as would have occurred with classifi- 
cations based on only one test. The average number of shiftings from 
sections of classification on the basis of the average of the first two 
tests was calculated and is presented in Chart 2. Chart 2 is con- 
structed and is to be read exactly like Chart 1 except that its classifica- 
tions are based on the average of two tests instead of classifications 
based on only one test. It shows that on an average no child was shifted 
from the X section to the Z section, nor from the Z to the X section. 
The numbers on the diagonal, the 24 in the X row, the 18 in the Y 
row, and the 25 in the Z row show that on an average 67 children re- 
mained in the sections to which they were originally assigned. In this 
connection it must also be remembered that the personnel of these 67 
children does not remain the same thruout the various classifications. 


7 ee@ecaen. ao? = 


Chart 2. Groupings by 
Medians of Any [wo 
Tests 


In the third method of classification the children were divided into 
X, Y, and Z sections on the basis of the median percentile of the first 
three tests given. This is what would have happened had they been 
classified after the third test was given and had they been classified on 
the basis of the results of the first three tests. They were then redivided 
into X, Y, and Z sections on the basis of the medians of each of all pos- 
sible combinations of three tests out of the total of four tests. There 
were four such combinations possible. Listed in order they are combina- 
tions of tests, FAE, FAN, FEN, and AEN. This was done to find out 
what might have happened had the records of only three of these tests 
been kept. The average number of shiftings between sections from the 
original classification on the basis of the average of three tests was 
calculated and is presented in Chart 3. Chart 3 is constructed and is 
to be read exactly like Chart 1 except that it is based upon classifica- 
tions by the median of three tests instead of classification on the basis 
of only one test. 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 63 


X 32-26 


Z ae. 


Chart 3. ae by 
Medians of Any ree 
Tests 


It shows that on an average no child was shifted from the X sec- 
tion to the Z section nor from the Z section to the X section, and that 
the children remaining in the sections to which they were originally 
assigned as shown by the numbers on the diagonal, 26 in the X row, 
22 in the Y row, and 28 in the Z row, make a total of 76 children. 
Again it must be remembered that the personnel of these 76 children is 
not the same thruout the various classifications. 

Chart 1 shows that classification on the basis of a single group test 
brought it about that on an average only 54 children remained in the 
sections to which they were originally assigned when reclassified on 
the basis of any other single test; Chart 2 shows that classification on 
the basis of the median of two group tests brought it about that on an 
average 67 children remained in the section to which they were originally 
assigned when reclassified on the basis of the median of any other two 
group intelligence tests; Chart 3 shows that classification on the basis 
of the median of three group intelligence tests brought it about that 
on an average 76 children remained in the same sections to which they 
were originally assigned when reclassified upon the basis >f the median 
of three group intelligence tests. This increase in the number of chil- 
dren who remained in the sections to which fhey were originally as- 
signed is evidently an index of the accuracy with which children may be 
assigned to sections upon the basis of one, two, or three group intelli- 
gence tests. It shows that approximately 50 per cent more children are 
located by the first classification in sections where they will remain upon 
reclassification when three tests are used than when only one test is 
used. Of course it must be remembered that these data, being based 
upon only four tests, tend to give an advantage to the classifications 
when three tests are used because they go within the circle of these 
four tests. 








64 BULLETIN OF THE SCHOOL OF EDUCATION 


TasLe II. Numser or CurmtprReEN ALWaAys ASSIGNED TO SAME SECTION BY 
Eacu or Turee Meruops or CLASSIFICATION 


Method of Sectioning 


























Section ms Medians Medians 
Single 
Test of any of any 
| —_ 2 Tests 3 Tests 
x | 10 16 21 
Y a 3 8 16 
7 16 18 26 
i os 
Total .. 29 42 63 


As another method of determining the relative constancy with which 
children would always be assigned to the same section upon reclassifi- 
cation by a chosen method thru the use of another test or another series 
of tests calculation was made of the number of children who were 
always assigned to the same section. Table II shows the results of this 
calculation. The first column gives the title of the section; the sec- 
ond column shows that whereas 32 children were always assigned to 
the X section when classified by each of the single tests, only 10 chil- 
dren were always classified in the X section by all four tests; that only 
3 of the 32 children assigned to the Y section by each test were always 
assigned to the Y section by all four tests; and that 16 of the 32 
children assigned to the Z section by each test were always assigned 
to the Z section by all four tests. The third column gives the same 
type of calculation as the second, except that it shows how many chil- 
dren were always assigned to the same section when classifications were 
based on the medians of any two tests, and the fourth column gives the 
same data for classifications based on the medians of any three tests. 
The totals of each of these columns, the 29, the 42, and the 63, are 
an index of the relative reliability of classifications based upon the use 
of one, two, or three tests. It is undoubtedly of considerable signifi- 
cance that more than twice as many children remained in the same 
sections when three tests were employed as when only one test was 
employed. Especially is this true since less than one-third of the chil- 
dren remained in the sections to which originally assigned when re- 
classified by one test, whereas nearly two-thirds remained when classi- 
fied by three tests. The fact that only four tests were employed in 
making these calculations gives an advantage to the classifications based 
on three tests because two tests were always duplicated in every com- 
bination of three. 

Since only four tests have been employed in making the classifica- 
tions and reclassifications reported here, the absolute numbers dare not 
be taken at face value to indicate the relative reliability of one test 
compared with three, because the classifications based upon three tests 
are restricted to the limited possibilities afforded by only four tests. 











BY 


ys 


il- 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 65 


For that reason conclusions based upon this investigation must be con- 
servative, yet the increase in the number of children correctly classified 
when three tests were employed over those correctly classified by one 
test suggests the relative superiority of three tests over one. The classi- 
fications made when each test was taken as a separate basis of classi- 
fication are entirely free from the objection raised above because the 
result was four different classifications, each one independent of the 
other. What happened in this case is very significant as is indicated 
by a little study of Chart 1 and the second column of Table II. Chart 1 
shows that on an average more than one-third of the children are shifted 
out of the X section, slightly less than one-third are shifted out of the 
Z section, and nearly two-thirds are shifted out of the Y section when 
reclassified upon the basis of only one test. Column 2 of Table II shows 
that less than one-third of the children were always assigned to the 
X section by all four tests, that only one-tenth of the children remained 
in the Y section, and that exactly one-half of the children remained in 
the Z section. 

With this set of conditions it is hardly to be expected that cor- 
relations of data of one group intelligence test with any other item of 
data such as teachers’ judgments, school progress, another group intelli- 
gence test, a group educational test, etc., would be very high. The 
shiftings of the children in their ratings on one test from that of an- 
other is so great that we could hardly expect to get consistent cor- 
relations regardless of the accuracy of the other ratings with which 
correlation might be calculated, such as teachers’ judgments of their 
children. Whether this is evidence against the ability of teachers to 
rate their children or the capriciousness of test results is an open 
question. 

It is of interest to note that more children were consistently placed 
in the Z section than in any other section. 

It would seem that if real progress is to be made in the use of 
group intelligence tests that it is necessary to determine from them 
some index which will remain fairly constant for a given individual in 
order that it may serve as a point of reference for comparison with 
other data. The results of one group intelligence test obviously do not 
furnish this index. It seems to be the impression of some workers in 
this field that it is not so important whether the individual pupil is 
accurately located or not and that the chief use of tests is for com- 
paring groups with each other. This attitude would be valid were it 
not for the fact that the teacher is required to deal with individual 
children, even tho she attempts to handle them in groups. Her problem 
usually is not that of her class, but that of certain individuals within 
her class. 





The Effect of Population upon Ability to 
Support Education 


HAROLD F.. CLARK, Associate Professor of Education, Indiana University 


This paper was read at the meeting, but is not published herewith, 
inasmuch as a more detailed presentation of the same subject will 
appear as the next number of this series—the September issue of the 
Bulletin of the School of Education.—Editor. 


A Measure of the Latin Element in Thorn- 


dike’s Teacher's Word Book 


Epwarp Y. LINDSAY, Instructor in Latin and Greek, Indiana University 


THE full title of the present paper should be: “A Measure of the 
Utility of Lodge’s Vocabulary of High School Latin for interpreting the 
Latin element in Thorndike’s Teacher’s Word Book’, or to put it more 
briefly but less exactly, the title might be stated: “The Relationship 
between Lodge’s Vocabulary of High School Latin and Thorndike’s 
Teacher’s Word Book”, 

Both of these books came into being as a result of a single tendency 
in modern educational progress; namely, the tendency to concentrate 
attention and energy on minimum essentials in order to insure effective 
results in the processes of teaching and learning. But once it has been 
decided to concentrate upon minimum essentials, the question at once 
arises: What are the minimum essentials? It was to answer the ques- 
tion: What are the minimum essentials in Latin Vocabulary? that 
Lodge’s Vocabulary of High School Latin was produced; and it was to 
answer the question: What are the minimum essentials in English 
vocabulary? that Thorndike’s Teacher’s Word Book was produced. The 
authors of the two books mentioned bear witness to the preceding sen- 
tence in their prefaces. 

Professor Lodge, on pages iii and iv of his Preface, sets forth the 
purpose of the Vocabulary of High School Latin in the following words: 
‘...the acquisition of vocabulary presupposes a knowledge of what 
vocabulary is of most value, and it is in this that our teaching has been 
handicapped up to the present time.... Our habit has been to read 
Caesar, Cicero, and Vergil with the hope that by constant thumbing 


( 6) 








ill 
he 


he 
he 
re 
Lip 
e’s 


the 
ds: 
hat 
een 
ead 
ing 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 67 


of the lexicon the student may gradually acquire the command of a 
certain vocabulary.... But experience has proven that most students 
taught after this fashion obtain very little actual ability in reading 
Latin.... The aim of the present book is to set forth the complete 
vocabulary of Caesar De Bello Gallico, Books I-V; Cicero, the six ora- 
tions usually read in schools; and Vergil’s Aeneid, Books I-VI. Statis- 
tics are given of the number of times every word occurs, and a selec- 
tion of 2,000 words has been made, comprising with few exceptions the 
words of most frequent occurrence, arranged so they can be taught at 
the rate of so many per year.” 

Professor Thorndike, on page iv of the Teacher’s Word Book, states: 
“ ..the list as it stands is far better than any that has hitherto been 
available and will be a help to all teachers in estimating the common- 
ness and importance of words. The conscientious and thoughtful teacher 
now spends much time and thought in deciding what pedagogical treat- 
ment to use in the case of the words that offer difficulty to pupils.... 
This Word Book helps the teacher to decide quickly which treatment is 
appropriate by telling her just how important any word is.” Thus 
Lodge for Latin and Thorndike for English has each attempted to 
set forth minimum essentials in vocabulary. 

Before considering the relationship that may exist between the 
Latin and English words thus presented as minimum essentials in 
vocabulary, it is well to review very briefly the main facts in regard 
to each of these books. 

Lodge’s Vocabulary of High School Latin, published by Teachers 
College, Columbia University, first appeared in 1907. It was based on 
a count of all the Latin words used in the selections from the writings 
of Caesar, Cicero, and Vergil that are generally accepted as the stand- 
ard Latin reading material for the high school. In this Latin reading 
matter the total number of word occurrences was found to be 77,142; 
the total number of words was found to be 4,650. But of these 4,650 
Latin words only 1,954 occur five times or more in the reading matter 
considered. To this list of words occurring five times or over, 46 other 
words of importance were added to make the list an even 2,000. It is 
recommended that pupils studying high school Latin memorize these 
2,000 words at the rate of 500 words per year thru the four years of 
the course. In Lodge’s book the words are printed in distinctive type 
to indicate in which year of high school each should be learned. They 
are also arranged in two lists, one alphabetical and the other in the 
order in which a pupil will encounter them in the course of his read- 
ing progress. If anyone be inclined to question whether these 2,000 
words are of value in reading Latin other than that on which the list 
was based, he can find an answer to his question on pages iv and v of 
the preface of the Vocabulary of High School Latin. 

After quoting the figures just mentioned, Professor Lodge continues: 
“Several interesting results accrue from these figures. The number of 
words occurring five times or more is surprisingly small. Furthermore 
these words are the essential words in the Latin language; for examina- 
ation of a relatively equal amount of material selected from Caesar’s 





68 BULLETIN OF THE SCHOOL OF EDUCATION 


Civil War, Cicero’s Orations other than those read in schools, and Ovid, 
showed the occurrence of more than nine-tenths of these words.... A 
student who has at his command these 2,000 words will have the vocab- 
ulary of fully nine-tenths of all the ordinary Latin that he is likely 
to come in contact with. He will really have much more, because the 
remaining tenth contains a large proportion of compounds of words 
already known.” 

As a basis for progress in reading Latin the value to high school 
pupils of memorizing Lodge’s 2,000 Latin words cannot be seriously ques- 
tioned. However, progress on the part of pupils in the reading and 
translation of Latin literature is not the only thing that is required 
of the study of Latin in the high school. In the course of the Classical 
Investigation recently conducted, a number of questionnaires were sent 
out with the purpose of ascertaining the general opinion prevalent in 
regard to the value of the study of Latin in the high school. I quote 
from Part I, General Report of the Classical Investigation, published 
by the Princeton University Press. On page 42 we are told that 98 
per cent of the high school teachers who responded agreed that “in- 
creased ability to understand the exact meaning of English words de- 
rived directly or indirectly from Latin, and increased accuracy in their 
use” is a valid objective for the course in high school Latin as a whole, 
and by vote of the teachers it was ranked as third objective of im- 
portance in the first year of high school; as of first importance in the 
second year; as of fourth importance in the third year; and as sixth 
in importance in the fourth year. On page 73 we are told that of high 
school pupils questioned who were then taking their fourth year of 
Latin, in giving their reasons for continuing their study of Latin thru 
the four years, 47 per cent said they had to have it for college entrance, 
but also 47 per cent said they conténued their study because they found 
Latin helped in English, especially in vocabulary. On page 75 it is 
stated that 93 per cent of the college graduates questioned believed 
that their ability for understanding and using English words derived 
from Latin had been improved by their study of Latin in the high school. 
On page 78 the Advisory Committee of the American Classical League 
enumerates ten objectives which are considered valid for the course in 
Latin for the secondary school as a whole. The first of these objectives 
is, naturally: “increased ability to read and understand Latin”. The 
second is: “increased understanding of those elements in English which 
are related to Latin”. 

On pages 133-134 the Committee, in making recommendations in 
regard to vocabulary, says in part: “The vocabulary to be thoroughly 
mastered during each year of the course should be selected for the 
purpose of providing the conditions most favorable both for the pro- 
gressive development of power to read and understand Latin and for 
attainment of the ultimate objectives which teachers consider valid for 
their pupils and which depend for their attainment upon vocabulary 
content. For the purpose of developing power to read Latin, frequency 
of occurrence in the Latin to be read is the most important factor in the 
selection of the vocabulary to be emphasized. “...There is...general 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 69 


agreement that the one most important ultimate objective which is de- 
pendent upon vocabulary for attainment is increased ability to under- 
stand the exact meaning of English words derived directly or indirectly 
from Latin and increased accuracy in their use.” As said before, the 
value of the Lodge Latin list cannot be seriously questioned as a basis 
for progress in reading and understanding Latin; one would be justified, 
however, in questioning whether Lodge’s list has a value at all similar 
as a basis for progressive development in understanding the exact mean- 
ing of English words derived directly or indirectly from Latin. In other ~ 
words, are there enough commonly used English words derived from the 
Latin words in Lodge’s list to justify their study as a help in under- 
standing English? One of the three objects attempted by the writer of 
this paper in making the study upon which this paper is based was to 
find an answer to that question. 

It was noted a few moments ago that the majority of high school 
pupils, college graduates, and especially high school teachers of Latin 
are agreed that one of the most important ultimate objectives to be 
sought in the teaching of Latin in the high school is increase in the 
understanding of English words derived from Latin. And yet the 
Report of the Classical Investigation shows that there is a serious dis- 
crepancy between theory and practice on the part of the teachers. On 
page 174 it is stated that only 54 per cent of the pupils who filled out 
questionnaires reported that they were frequently asked questions di- 
rected toward increasing their ability to apply their knowledge of Latin 
in interpreting English derivatives. On page 179 are given the sta- 
tistics on sets of Latin examination questions that were subjected to 
analysis. Of.the sets of questions for the first year of high school 
Latin, 42 per cent contained questions on English derivatives; for the 
second year, 27 per cent contained questions on English derivatives; 
for the third year, 23 per cent; and for the fourth year, 29 per cent. 
The greatest discrepancy between theory and practice is to be observed 
in the second year; it will be recalled that the consensus of opinion of 
the teachers was that derivative study in the second year should have 
first place, and yet only 27 per cent of the sets of questions contained 
any questions on derivatives at all. On page 180 the committee is 
moved to recommend: “More emphasis upon the relation of Latin to 
English, especially in the contribution which the study of Latin may 
make to a knowledge of English derivatives and of the principles of 
English grammar.” There is a perfectly definite reason for this dis- 
crepancy between theory and practice, it would seem, on the part of the 
teachers: namely, the fact that so far there has not been published a 
list of derivatives that may be regarded as minimum essentials for 
study and upon which attention may be concentrated. 

In 1921 Thorndike’s Teacher’s Word Book was published by Teach- 
ers College of Columbia University. In preparing it, Professor Thorn- 
dike purposed to furnish a list of the English words most commonly 
used in written and printed matter and to give to each word a number 
as an index to its importance when compared with other words in the 
list. This list was intended primarily to enable teachers in the ele- 








70 BULLETIN OF THE SCHOOL OF EDUCATION 


mentary schools to determine which words are of importance for each 
of the grades of school. Owing to the range of sources used and to the 
number of word occurrences counted, the 10,000 words in the list may 
reasonably be regarded as the most commonly used words in the Eng- 
lish language, that is, in written and printed English. I quote from the 
first paragraph in the book: “The Teacher’s Word Book is an alpha- 
betical list of the 10,000 words which are found to occur most widely 
in a count of about 625,000 words from literature for children; about 
3,000,000 words from the Bible and English classics; about 300,000 words 
from elementary school textbooks; about 50,000 werds from books about 
cooking, sewing, farming, the trades, and the like; about 90,000 words 
from the daily newspapers; and about 500,000 words from correspond- 
ence. Forty-one different sources were used. A measure of the range 
and frequency of each word’s occurrence is given by the credit-number 
following it. Range answers the question, “How many of these forty- 
one different sources use the word?” or “How widely is the word used?” 
Frequency answers the question, “How often is the word used?” 

Immediately upon the appearance of this book of Thorndike’s, the 
Investigating Committee of the American Classical League suggested 
that, since this list is fairly extensive in that it contains 10,000 Eng- 
lish words, and since these words may be regarded as being, within 
reasonable limits, the 10,000 most commonly used words in the language, 
by analyzing this list it would be possible to isolate the most commonly 
used Latin derivatives in English and list them for the convenience of 
teachers and pupils. 

It happens that it is easily possible to divide the 10,000 words in 
the Teacher’s Word Book roughly into halves of approximately 5,000 
words each on the basis of frequency of use. Accordingly the writer 
of this paper volunteered and started work on the etymological analysis 
of the 5,000 words of highest frequency of occurrence. A little later 
Miss Belle Coulter, then a graduate student in the Department of Latin 
here (Indiana University), now of the Latin Department of the West 
Lafayette High School, started on the etymological analysis of the 
5,000 words whose frequency of use is not so high. This preliminary 
etymological analysis was done under the advice of the Latin Depart- 
ment of Indiana University, and the stenographic work, which was very 
considerable, was furnished by the School of Education of this Uni- 
versity thru its Bureau of Coéperative Research. In this work of 
analysis, for convenience in handling, Skeat’s Etymological Dictionary 
of the English Language, Oxford, edition of 1910 was used as authority. 
In the extremely rare instances in which Skeat’s judgment on a word 
could not be secured, that word was looked up in the New Oxford Dic- 
tionary. 

In beginning work upon this study it seemed to the writer of this 
paper, as a student of the classical languages, that three questions could 
be answered by the study and that the study ought not to be regarded 
as complete until an attempt had been made to answer them. 

Question One: Exactly which English words of these 10,000 in 
most common use are derived from Latin and also which are derived 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 71 


from Greek? The answer to this question is, of course, the list of 
Latin derivatives in English sought by the Committee of the Classical 
Investigation, with the addition of a similar list of Greek derivatives. 

Question Two: How do the Latin derivatives and also the Greek 
derivatives among these 10,000 most commonly used English words com- 
pare with the derivatives of other languages in numbers and in fre- 
quency of use? 

Question Three: Does any considerable number of the Latin de- 
rivatives among these 10,000 English words trace back as their source 
to Latin words that are included in Lodge’s list of 2,000 Latin words? 
In other words, Has the Lodge Latin list anything like the same value 
for interpreting commonly used English words derived from Latin that 
it has for promoting progress in the reading and translating of Latin? 

Work on this study was begun in the autumn of 1921, the pre- 
liminary analysis was completed within the year 1922, and the answer 
to the first question—that is the list of English derivatives from Latin 
—could have been published in a partially complete form about that time. 
The writer of this paper, however, retained his own records, obtained 
copies of Miss Coulter’s records, and refused to make any report until 
one could be made that would attempt an answer to the three questions 
proposed. All work on this study since 1922 has been done by the 
writer of this paper, and, since he could devote to it only such time as 
could be spared from teaching, graduate study, and unavoidable and 
time-consuming activities, it is not until now under date of March, 1925, 
that the study is being published as one of the series of Indiana Studies. 
It is also a result of this delay that this study, tho mentioned in the 
earlier reports of the Classical Investigation, is not mentioned in the 
final report. 

The procedure followed in attempting to find answers to the three 
questions proposed will be outlined below. 

First, the preliminary etymological analysis was made as mentioned 
before. The record of this analysis is referred to in the published ac- 
count of the study as the English list. This English list is divided into 
halves containing respectively the 5,000 English words of highest fre- 
quency and the 5,000 words whose frequency of occurrence is not so high. 
Each half is arranged alphabetically. To the right of each word in the 
list is placed: first, the index, number indicative of the frequency 6f its 
use that was assigned to it by Thorndike; next, to the right of this index 
number, the etymological history of the word is indicated; next, still to 
the right, the Latin or Greek source word is written, if the English word 
is derived from either of these languages. No source word is recorded 
from any language except Latin or Greek. By study of this so-called 
English list it was possible to obtain, more or less directly, an answer to 
the three questions proposed. (It was at first planned to publish this 
English list as a part of the study, but because of its bulk it was finally 
omitted. The Latin and Greek lists described below were included, how- 
ever.) 

To have obtained with least work an answer to the first question 
proposed, namely, “Exactly which English words of these 10,000 in most 





72 BULLETIN OF THE SCHOOL OF EDUCATION 





common use are derived from Latin and also which are derived from wor 
Greek?”—it would only have been necessary to make lists of the Latin only 
and Greek derivatives in alphabetical order, followed in each case by Lat 
the Latin or Greek source word. It seemed, however, to the writer of stat 
this paper that the results of the study would be in a form much more stuc 
accessible and valuable to high school teachers of Latin, if all the Eng- 

lish derivatives among those 10,000 words that trace hack to one Latin bee! 
source word could be grouped together. As a result of this opinion 

two more word lists were formed, a Latin word list and a Greek word hav 
list. not 

The record, referred to in the published account of this study as the pay 
Latin list, was formed on the following plan. At the left-hand side of vat 
the page in a column arranged in alphabetical order are all those Latin : 
words which have derivatives among the 10¢2) English words under aaY 
consideration. Immediately to the right of each Latin word is a letter of 
to indicate the fact, if the Latin word is to be found in Lodge’s list; of 
if the word is not found there, the space is left blank. If the Latin word the 
is one that Lodge recommends should be learned while the pupil is study- sa 
ing Caesar, the letter used is a capital C; if it is a word to be memorized ‘ate 
during the study of Cicero, a T (for Tullius) indicates that fact; and if gu 
it is a Vergilian word a V so indicates. Next, to the right of the letter oa 
just mentioned following each Latin word is a number; this number is TI 
the sum of the index numbers of all the English words in Thorndike’s - 
list that are derived from this one Latin word. Following this number is - 
yet another, the number of English derivatives in Thorndike’s list from - 
this one Latin word. To the right of this second number is yet a third - 
which indicates the rank of the given Latin word when compared with o 
all other Latin words in the list on the basis of the number of its de- 2 
rivatives and the frequency of their use among the English words of ee 
the Thorndike list. To the right of these three numbers are all the be 
English derivatives from the given Latin word that are found among tl 
the 10,000 words in the Thorndike list. These derivatives are arranged 2 
in two columns to indicate whether they are from among the 5,000 words tl 
of higher frequency or from among the 5,000 of less frequency; in each . 
column the derivatives are arranged in alphabetical order. 

The Greek list follows a plan almost identical with that of the . 
Latin list. These two lists, Latin and Greek, are the answer to ques- t 
tion one, as nearly as it has been possible for the writer to find it. ¥ 

To find the answer to Question Two, “How do the Latin derivatives : 
and also the Greek derivatives among these 10,000 most commonly used ‘ 
English words compare with the derivatives of other languages in num- . 
bers and in frequency of use?” it was only necessary to turn again to : 
the English list, to count all the derivatives of each source language, 
to add together and to credit to each source language the index num- ' 
bers of all its derivatives, and then to compare results. These figures 
are contained in seven tables of statistics in the published report of the 
study. A few especially significant figures will be quoted later in this 
paper. 

In seeking the answer to Question Three, “Do many of the Latin 
derivatives among these 10,000 English words trace back to source 











SF 


— Sm Oo Fh 


—s 5 = OR OO 


el ee ee ee ee, ee ee | 


i, ee | 


“ 








CONFERENCE ON EDUCATIONAL MEASUREMENTS 73 


words that are included in Lodge’s list of 2,000 Latin words?” it was 
only necessary to turn to the Latin list and to count the number of 
Latin source words that are identical with words in Lodge’s list. The 
statistics on this count are in Table 8 in the published record of this 
study. 

It now remains to summarize very briefly the answers, as they have 
been found, to our three questions. 

For the answer to Question One, the Latin and Greek lists will 
have to be consulted by anyone interested for himself, since it would 
not amount to anything except a waste of time to reproduce in this 
paper the list of 4,198 Latin derivatives or the list of 657 Greek deri- 
vatives that occur among Thorndike’s 10,000 English words. 

In answer to Question Two, concerning the comparison of the de- 
rivatives of the various source languages in numbers and frequency 
of use, it was found that the vast majority of the 10,000 words were 
of native English, of Latin, or of Greek origin. The derivatives of 
these three languages comprise 88.32 per cent of the words considered, 
and, on the basis of Thorndike’s index numbers, 91.39 per cent of the 
word occurrences; in other words, the derivatives of these three lan- 
guages are used 91.39 per cent of the time that the total 10,000 words 
are used. The figures for each of the three languages are as follows: 
The native English element comprises 35.15 per cent of the 10,000 words, 
and is in use 50.52 per cent of the time that these 10,000 words are 
used. Words of Latin origin comprise 45.98 per cent of the words and 
are in use 36.11 per cent of the time. Words of Greek origin comprise 
7.19 per cent of the words and are in use 4.76 per cent of the time. It 
will be noted that if the words of native English origin and those de- 
rived from Latin be taken together, they comprise 80.23 per cent of the 
words considered and 86.63 per cent of the word occurrences. It may 
be noted further that the native English element comprises a little more 
than a third (35.15 per cent) of the words and about half (50.52 per 
cent) of the word occurrences, while the Latin element comprises some- 
thing less than half (45.98 per cent) of the words and a little more than 
a third (36.11 per cent) of the word occurrences. 

In comparing the frequency of use of the native English element 
and of the Latin element, it will be noted that the difference between 
the frequency of the Latin element 36.11 per cent, and the frequency 
of the English element 50.52 per cent amounts to 14.41 per cent. It 
should be remembered, however, that a considerable part of this differ- 
ence of 14.41 per cent is to be accounted for in the great frequency 
of use of the articles—a, an, the; of demonstrative pronouns—this, that, 
etc.; and of auxiliary verbs such as be, have, and other forms. If it 
be permitted to draw a conclusion from these figures, it may be said 
that a pupil in order to increase his understanding of the exact meaning 
of the vast majority of English words in most common use has need to 
study two languages and only two: his mother-tongue, English, and 
Latin, his grandmother-tongue, so as to speak. 

In answer to Question Three, “Do many of the Latin derivatives 
among the 10,000 English words in Thorndike’s Teacher’s Word Book 
trace back to Latin source words that are included in Lodge’s Vocabulary 








74 BULLETIN OF THE SCHOOL OF EDUCATION 


of High School Latin?” it was found that 74.08 per cent of the Latin 
derivatives among the 10,000 English words trace back to Latin source 
words that occur among the 2,000 Latin words in Lodge’s list. 

By the time a pupil has completed two years of secondary school 
Latin, provided he has learned Lodge’s list of Caesarian words, he has 
learned the source words of 43.98 per cent of the Latin derivatives in 
Thorndike’s Teacher’s Word Book; and these derivatives are in use 
46.54 per cent of the time that any Latin derivatives among these 10,000 
most frequently used English words are in use. By the close of the 
year’s study of Cicero, the pupil should have added to his knowledge the 
source words of 14.81 per cent more of the Latin derivatives in the 
Thorndike list; and by the close of the study of Vergil there should have 
been added the source words of still 15.29 per cent more of these -Latin 
derivatives. A pupil, then, who has had four years of high school 
Latin, provided he has memorized the 2,000 Latin words recommended 
by Lodge, has learned the source words of 74.08 per cent of the Latin 
derivatives among the 10,000 most frequently used English words; and 
these derivatives are in use 77.82 per cent of all the time that Latin 
derivatives in this list of English words are in use. Reckoning in all 
the words in the Thorndike list, whether of Latin origin or not, the 
pupil who has mastered Lodge’s 2,000 Latin words has learned the 
source words of 33.96 per cent of the 10,000 most frequently used 
English words and of those in use 28.10 per cent of the time. 

In the course of his four years of high school Latin a pupil will 
meet, but ordinarily will not be required to learn, the source words of 
still 11.71 per cent more of these Latin derivatives. These have not 
been taken into account in the reckoning above. If they were, it could 
be said that at the end of his high school course in Latin the pupil 
will have met the source words of 85.79 per cent of the most commonly 
used Latin derivatives, in English. 

It would seem, then, that Lodge’s 2,000 Latin words are of almost 
as much value for giving a knowledge of the Latin element among the 
most commonly used English words as they are for giving a ready 
reading ability in Latin. 

It would seem, further, that the vocabulary of the traditional Latin 
read in the secondary school is as valuable as could be hoped for in 
giving a knowledge of the source words of the Latin element among 
the most commonly used English words, since even the minimum part 
of it that should be memorized (that is, the Lodge list) includes the 
source words of three-fourths of the Latin derivatives among the most 
commonly used English words and since, taken as a whole, it, includes 
the source words of well over four-fifths of these Latin derivatives. 

Finally, to bring the wording of this conclusion into harmony with 
the title suggested for this paper, the statement should be made that 
the measure of the utility of the 2,000 Latin words emphasized in Lodge’s 
Vocabulary of High School Latin for interpreting the Latin element 
among the 10,000 most commonly used English words contained in Thorn- 
dike’s Teacher’s Word Book is almost exactly 75 per cent. 





Ome | 


TH 


—-_~ FSF MR AA AF 








tin 
rece 


ool 
las 


ise 
100 


ed 


ill 
of 
ot 
ld 
vil 
ly 





CONFERENCE ON EDUCATIONAL MEASUREMENTS 75 


Tasie I.* SumMArRizinG TaBLE. Comparison or LANGUAGES AMONG THE 
ENTIRE 10,000 Worps in THORNDIKE’s TEACHER’S WorD Book 

















Language Derivative | Percentage |Index Num-| Percentage 
Groups Word Total ber Total 

Sr eee | 3,209 35.15 95,101 50.52 
gee ere «arene 4,198 45.98 67 , 992 36.11 
SN eo non oc cams 657 7.19 8,962 4.76 
Scandinavian........... 376 4.12 7,465 3.96 
a. Renee ; 198 2.17 2,909 1.54 
Other Teutonic.......... 90 .98 1,336 .70 
Other Indo-European..... 239 2.62 3,177 1.69 
Other languages......... 163 1.78 1,304 . 69 
Lo Ee ar RARE eerie ON RS ae et ar 
Omitted words........ . WOE ~~ Be «mained go taleoeand Hekate | beacis ben 
(Proper names, etc.) 
Thorndike’s total .. | 10,000 











*Table I here is identical with Table VII in Indiana University Study No. 65. March, 1925. 


TasLe IT.* Taste SHow1na RELATION or LopGe’s Latin List To THE SOURCE 
Worps or THE LATIN DERIVATIVES IN THE 10,000 ENGLISH 
Worps In THORNDIKE’S TEACHER’S WorpD Book 


| Per cent, Per cent 
Per cent|English jof total|Sums oflof total 








Author | Latin |of Total| Deriv- |English | Index |Sums of 
| Words | Latin | atives | Deriva-| Values | Index 
Words tives Values 
Daeg eC een | 290) 26.13| 1,911 | 43.98 | 30,857| 46.54 
EE ee yee 141 12.70 644 14.81 | 10,416 15.71 


WO i axecoiiivs oe eG os 152 | 13.69 665 | 15.29 | 10,325) 15.57 
189 | 17.03 509 | 11.71 | 6,503 9.80 
Not in High School Latin| 338 | 30.45 618 | 14.21 | 8,197 | 12.36 





Det ie cnlcitvains er sae rst AER 66,208 |........ 




















*Table II is identical with Table VIII in Indiana University Study No. 65. March, 1925. 





INTERPRETATION OF TABLE II 


The table showing the relation of the Lodge Latin list to the list 
of the source words of the Latin derivatives among the 10,000 English 
words in Thorndike’s Teacher’s Word Book is to be interpreted as fol- 
lows: The figures, 1,110, at the bottom of the first column express the 
number of Latin source words from which come the Latin derivatives 
in the Teacher’s Word Book. Now reading the first line across the top 





76 BULLETIN OF THE SCHOOL OF EDUCATION 


of the table we are to understand that 290 out of the 1,110 source words 
mentioned above are to be found among the Latin words in the Caesar 
list in Lodge’s Vocabulary of High School Latin. It should be noted 
that more of the 1,000 words in Lodge’s Caesar list than the 290 just 
mentioned are really related to the source words in our list because 
Lodge includes decipio, incipio, recipio, etc., as well as the simple form, 
capio, while in our list compound forms are usually not separately | 
noted, but all derivatives of the capio family, for example, are traced 
back to the simple form, capio. Continuing our reading, the 290 words 
common to Lodge’s Caesar list comprise 26.13 per cent of the total of 
1,110 Latin source words in our list. These 290 Latin words have 1,911 
English derivatives among the words in Thorndike’s Teacher’s Word 
Book. These derivatives comprise 43.98 per cent of all the Latin de- 
rivatives in the Teacher’s Word Book. The sum of the frequency index 
numbers of all the English words in Thorndike’s Word Book that are 
derived from these 290 Latin words is 30,857, and this sum is 46.54 per 
cent of the total sum of all the frequency index numbers of all the 
words in Thorndike’s Word Book that are derived from Latin. 

The figures in the row following the word Lodge are the statistics 
on those words in our list that are included in Lodge’s Vocabulary of 
High School Latin but are not in his list of 2,000 important words. The 
figures in the row headed Not in High School Latin are statistics on 
those words of our list that are not found in Lodge’s Vocabulary. Forty- 
two of these last-mentioned words, whose derivatives number 68, are 
non-classical Latin and are not to be found in Harper’s Dictionary. 








ords 


Sar 
ioted 
just 
ause 
orm, 
ately 
aced 
rords 
al of 
1,911 
Vord 
1 de- 
ndex 
> are 
L per 
| the 


istics 
ry of 
The 


"Ss on 
orty- 
, are 
nary. 




















